跳至主要內容

中國移動 ECloud ElasticSearch VectorSearch (China Mobile ECloud ElasticSearch VectorSearch)

中國移動雲向量搜尋 (China Mobile ECloud VectorSearch) 是一個完全託管、企業級的分散式搜尋和分析服務。中國移動雲向量搜尋為結構化/非結構化資料提供低成本、高效能且可靠的檢索和分析平台層級產品服務。 作為向量資料庫,它支援多種索引類型和相似度距離方法。

您需要使用 pip install -qU langchain-community 安裝 langchain-community 才能使用此整合

此筆記本示範如何使用與 ECloud ElasticSearch VectorStore 相關的功能。 若要執行,您應該啟動並執行 中國移動雲向量搜尋 (China Mobile ECloud VectorSearch) 執行個體

閱讀說明文件以快速熟悉和設定中國移動雲 ElasticSearch 執行個體。

在執行個體啟動並執行後,請依照下列步驟分割文件、取得嵌入、連線到百度雲端 Elasticsearch 執行個體、索引文件以及執行向量檢索。

#!pip install elasticsearch == 7.10.1

首先,我們要使用 OpenAIEmbeddings,因此我們必須取得 OpenAI API 金鑰。

import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

其次,分割文件並取得嵌入。

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import EcloudESVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

ES_URL = "https://127.0.0.1:9200"
USER = "your user name"
PASSWORD = "your password"
indexname = "your index name"

然後,索引文件

docsearch = EcloudESVectorStore.from_documents(
docs,
embeddings,
es_url=ES_URL,
user=USER,
password=PASSWORD,
index_name=indexname,
refresh_indices=True,
)

最後,查詢並檢索資料

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query, k=10)
print(docs[0].page_content)

常用的案例

def test_dense_float_vectore_lsh_cosine() -> None:
"""
Test indexing with vectore type knn_dense_float_vector and model-similarity of lsh-cosine
this mapping is compatible with model of exact and similarity of l2/cosine
this mapping is compatible with model of lsh and similarity of cosine
"""
docsearch = EcloudESVectorStore.from_documents(
docs,
embeddings,
es_url=ES_URL,
user=USER,
password=PASSWORD,
index_name=indexname,
refresh_indices=True,
text_field="my_text",
vector_field="my_vec",
vector_type="knn_dense_float_vector",
vector_params={"model": "lsh", "similarity": "cosine", "L": 99, "k": 1},
)

docs = docsearch.similarity_search(
query,
k=10,
search_params={
"model": "exact",
"vector_field": "my_vec",
"text_field": "my_text",
},
)
print(docs[0].page_content)

docs = docsearch.similarity_search(
query,
k=10,
search_params={
"model": "exact",
"similarity": "l2",
"vector_field": "my_vec",
"text_field": "my_text",
},
)
print(docs[0].page_content)

docs = docsearch.similarity_search(
query,
k=10,
search_params={
"model": "exact",
"similarity": "cosine",
"vector_field": "my_vec",
"text_field": "my_text",
},
)
print(docs[0].page_content)

docs = docsearch.similarity_search(
query,
k=10,
search_params={
"model": "lsh",
"similarity": "cosine",
"candidates": 10,
"vector_field": "my_vec",
"text_field": "my_text",
},
)
print(docs[0].page_content)

帶有篩選條件的案例

def test_dense_float_vectore_exact_with_filter() -> None:
"""
Test indexing with vectore type knn_dense_float_vector and default model/similarity
this mapping is compatible with model of exact and similarity of l2/cosine
"""
docsearch = EcloudESVectorStore.from_documents(
docs,
embeddings,
es_url=ES_URL,
user=USER,
password=PASSWORD,
index_name=indexname,
refresh_indices=True,
text_field="my_text",
vector_field="my_vec",
vector_type="knn_dense_float_vector",
)
# filter={"match_all": {}} ,default
docs = docsearch.similarity_search(
query,
k=10,
filter={"match_all": {}},
search_params={
"model": "exact",
"vector_field": "my_vec",
"text_field": "my_text",
},
)
print(docs[0].page_content)

# filter={"term": {"my_text": "Jackson"}}
docs = docsearch.similarity_search(
query,
k=10,
filter={"term": {"my_text": "Jackson"}},
search_params={
"model": "exact",
"vector_field": "my_vec",
"text_field": "my_text",
},
)
print(docs[0].page_content)

# filter={"term": {"my_text": "president"}}
docs = docsearch.similarity_search(
query,
k=10,
filter={"term": {"my_text": "president"}},
search_params={
"model": "exact",
"similarity": "l2",
"vector_field": "my_vec",
"text_field": "my_text",
},
)
print(docs[0].page_content)

此頁面有幫助嗎?