Hippo
Transwarp Hippo 是一個企業級雲原生分散式向量資料庫,支援海量基於向量的資料集的儲存、檢索和管理。 它有效地解決了向量相似性搜尋和高密度向量聚類等問題。
Hippo
具有高可用性、高性能和易於擴展的特性。 它具有多種功能,例如多個向量搜尋索引、資料分割和分片、資料持久性、增量資料攝取、向量標量欄位過濾和混合查詢。 它可以有效地滿足企業對海量向量資料的高即時搜尋需求
開始使用
這裡唯一的前提條件是從 OpenAI 網站取得 API 金鑰。 請確保您已經啟動了 Hippo 實例。
安裝依賴套件
最初,我們需要安裝某些依賴套件,例如 OpenAI、Langchain 和 Hippo-API。 請注意,您應該安裝適合您環境的版本。
%pip install --upgrade --quiet langchain langchain_community tiktoken langchain-openai
%pip install --upgrade --quiet hippo-api==1.1.0.rc3
Requirement already satisfied: hippo-api==1.1.0.rc3 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (1.1.0rc3)
Requirement already satisfied: pyyaml>=6.0 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (from hippo-api==1.1.0.rc3) (6.0.1)
注意:Python 版本需要 >=3.8。
最佳實踐
匯入依賴套件
import os
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.hippo import Hippo
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
載入知識文件
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI KEY"
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
分割知識文件
在這裡,我們使用 Langchain 的 CharacterTextSplitter 進行分割。 分隔符號是一個句點。 分割後,文字段不超過 1000 個字元,重複字元的數量為 0。
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
宣告嵌入模型
下面,我們使用 Langchain 的 OpenAIEmbeddings 方法建立 OpenAI 或 Azure 嵌入模型。
# openai
embeddings = OpenAIEmbeddings()
# azure
# embeddings = OpenAIEmbeddings(
# openai_api_type="azure",
# openai_api_base="x x x",
# openai_api_version="x x x",
# model="x x x",
# deployment="x x x",
# openai_api_key="x x x"
# )
宣告 Hippo 客戶端
HIPPO_CONNECTION = {"host": "IP", "port": "PORT"}
儲存文件
print("input...")
# insert docs
vector_store = Hippo.from_documents(
docs,
embedding=embeddings,
table_name="langchain_test",
connection_args=HIPPO_CONNECTION,
)
print("success")
input...
success
進行基於知識的問答
建立大型語言問答模型
下面,我們分別使用 Langchain 的 AzureChatOpenAI 和 ChatOpenAI 方法建立 OpenAI 或 Azure 大型語言問答模型。
# llm = AzureChatOpenAI(
# openai_api_base="x x x",
# openai_api_version="xxx",
# deployment_name="xxx",
# openai_api_key="xxx",
# openai_api_type="azure"
# )
llm = ChatOpenAI(openai_api_key="YOUR OPENAI KEY", model_name="gpt-3.5-turbo-16k")
根據問題取得相關知識:
query = "Please introduce COVID-19"
# query = "Please introduce Hippo Core Architecture"
# query = "What operations does the Hippo Vector Database support for vector data?"
# query = "Does Hippo use hardware acceleration technology? Briefly introduce hardware acceleration technology."
# Retrieve similar content from the knowledge base,fetch the top two most similar texts.
res = vector_store.similarity_search(query, 2)
content_list = [item.page_content for item in res]
text = "".join(content_list)
建構提示範本
prompt = f"""
Please use the content of the following [Article] to answer my question. If you don't know, please say you don't know, and the answer should be concise."
[Article]:{text}
Please answer this question in conjunction with the above article:{query}
"""
等待大型語言模型產生答案
response_with_hippo = llm.predict(prompt)
print(f"response_with_hippo:{response_with_hippo}")
response = llm.predict(query)
print("==========================================")
print(f"response_without_hippo:{response}")
response_with_hippo:COVID-19 is a virus that has impacted every aspect of our lives for over two years. It is a highly contagious and mutates easily, requiring us to remain vigilant in combating its spread. However, due to progress made and the resilience of individuals, we are now able to move forward safely and return to more normal routines.
==========================================
response_without_hippo:COVID-19 is a contagious respiratory illness caused by the novel coronavirus SARS-CoV-2. It was first identified in December 2019 in Wuhan, China and has since spread globally, leading to a pandemic. The virus primarily spreads through respiratory droplets when an infected person coughs, sneezes, talks, or breathes, and can also spread by touching contaminated surfaces and then touching the face. COVID-19 symptoms include fever, cough, shortness of breath, fatigue, muscle or body aches, sore throat, loss of taste or smell, headache, and in severe cases, pneumonia and organ failure. While most people experience mild to moderate symptoms, it can lead to severe illness and even death, particularly among older adults and those with underlying health conditions. To combat the spread of the virus, various preventive measures have been implemented globally, including social distancing, wearing face masks, practicing good hand hygiene, and vaccination efforts.