跳到主要內容

Elasticsearch

Elasticsearch 是一個分散式、RESTful 的搜尋和分析引擎,能夠執行向量和詞彙搜尋。它建立在 Apache Lucene 函式庫之上。

這個筆記本展示了如何使用與 Elasticsearch 向量儲存相關的功能。

設定 (Setup)

為了使用 Elasticsearch 向量搜尋,您必須安裝 langchain-elasticsearch 套件。

%pip install -qU langchain-elasticsearch

憑證 (Credentials)

有兩種主要方式可以設定 Elasticsearch 實例以與

  1. Elastic Cloud 搭配使用:Elastic Cloud 是一項託管的 Elasticsearch 服務。註冊免費試用版

要連接到不需要登入憑證的 Elasticsearch 實例 (啟動啟用安全性的 Docker 實例),請將 Elasticsearch URL 和索引名稱以及嵌入物件傳遞給建構子。

  1. 本機安裝 Elasticsearch:透過在本機執行 Elasticsearch 開始使用。最簡單的方法是使用官方的 Elasticsearch Docker 映像。請參閱Elasticsearch Docker 文件以取得更多資訊。

透過 Docker 執行 Elasticsearch (Running Elasticsearch via Docker)

範例:執行安全性已停用的單節點 Elasticsearch 實例。不建議用於生產環境。

%docker run -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.security.http.ssl.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.12.1

使用驗證執行 (Running with Authentication)

對於生產環境,我們建議您啟用安全性執行。要使用登入憑證連接,您可以使用參數 es_api_keyes_useres_password

pip install -qU langchain-openai
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_elasticsearch import ElasticsearchStore

elastic_vector_search = ElasticsearchStore(
es_url="https://127.0.0.1:9200",
index_name="langchain_index",
embedding=embeddings,
es_user="elastic",
es_password="changeme",
)
API 參考:ElasticsearchStore

如何取得預設 "elastic" 用户的密碼?(How to obtain a password for the default "elastic" user?)

要取得預設 "elastic" 用户的 Elastic Cloud 密碼

  1. 請登入 Elastic Cloud 控制台,網址為 https://cloud.elastic.co
  2. 前往 "Security" > "Users"
  3. 找到 "elastic" 用户並點擊 "Edit"
  4. 點擊 "Reset password"
  5. 按照提示重設密碼

如何取得 API 金鑰?(How to obtain an API key?)

要取得 API 金鑰

  1. 請登入 Elastic Cloud 控制台,網址為 https://cloud.elastic.co
  2. 開啟 Kibana 並前往 Stack Management > API Keys
  3. 點擊 "Create API key"
  4. 輸入 API 金鑰的名稱並點擊 "Create"
  5. 複製 API 金鑰並將其貼到 api_key 參數中

Elastic Cloud

要連接到 Elastic Cloud 上的 Elasticsearch 實例,您可以使用 es_cloud_id 參數或 es_url

elastic_vector_search = ElasticsearchStore(
es_cloud_id="<cloud_id>",
index_name="test_index",
embedding=embeddings,
es_user="elastic",
es_password="changeme",
)

如果您想要獲得一流的自動模型呼叫追蹤,您也可以透過取消註解下方來設定您的 LangSmith API 金鑰

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

初始化 (Initialization)

Elasticsearch 正在本機 localhost:9200 上透過 docker 執行。有關如何從 Elastic Cloud 連接到 Elasticsearch 的更多詳細資訊,請參閱上方的使用驗證連接

from langchain_elasticsearch import ElasticsearchStore

vector_store = ElasticsearchStore(
"langchain-demo", embedding=embeddings, es_url="https://127.0.0.1:9201"
)
API 參考:ElasticsearchStore

管理向量儲存 (Manage vector store)

將項目新增至向量儲存 (Add items to vector store)

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source": "tweet"},
)

document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source": "news"},
)

document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source": "tweet"},
)

document_4 = Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source": "news"},
)

document_5 = Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source": "tweet"},
)

document_6 = Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source": "website"},
)

document_7 = Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source": "website"},
)

document_8 = Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source": "tweet"},
)

document_9 = Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source": "news"},
)

document_10 = Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source": "tweet"},
)

documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)
API 參考:Document
['21cca03c-9089-42d2-b41c-3d156be2b519',
'a6ceb967-b552-4802-bb06-c0e95fce386e',
'3a35fac4-e5f0-493b-bee0-9143b41aedae',
'176da099-66b1-4d6a-811b-dfdfe0808d30',
'ecfa1a30-3c97-408b-80c0-5c43d68bf5ff',
'c0f08baa-e70b-4f83-b387-c6e0a0f36f73',
'489b2c9c-1925-43e1-bcf0-0fa94cf1cbc4',
'408c6503-9ba4-49fd-b1cc-95584cd914c5',
'5248c899-16d5-4377-a9e9-736ca443ad4f',
'ca182769-c4fc-4e25-8f0a-8dd0a525955c']

從向量儲存刪除項目 (Delete items from vector store)

vector_store.delete(ids=[uuids[-1]])
True

查詢向量儲存 (Query vector store)

一旦您的向量儲存已建立並且已新增相關文件,您很可能希望在執行您的鏈或代理程式期間查詢它。這些範例也展示了如何在搜尋時使用篩選。

直接查詢 (Query directly)

使用篩選器對中繼資料執行簡單的相似性搜尋可以如下完成

results = vector_store.similarity_search(
query="LangChain provides abstractions to make working with LLMs easy",
k=2,
filter=[{"term": {"metadata.source.keyword": "tweet"}}],
)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

帶分數的相似度搜尋

如果您想執行相似度搜尋並接收對應的分數,您可以執行

results = vector_store.similarity_search_with_score(
query="Will it be hot tomorrow",
k=1,
filter=[{"term": {"metadata.source.keyword": "news"}}],
)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.765887] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]

透過轉換為檢索器來查詢

您也可以將向量儲存區轉換為檢索器,以便在您的鏈中使用。

retriever = vector_store.as_retriever(
search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.2}
)
retriever.invoke("Stealing from the bank is a crime")
[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.'),
Document(metadata={'source': 'news'}, page_content='The stock market is down 500 points today due to fears of a recession.'),
Document(metadata={'source': 'website'}, page_content='Is the new iPhone worth the price? Read this review to find out.'),
Document(metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')]

用於檢索增強生成的使用方法

有關如何將此向量儲存區用於檢索增強生成 (RAG) 的指南,請參閱以下章節

常見問題

問題:將文件編入 Elasticsearch 時出現逾時錯誤。我該如何解決這個問題?

一個可能的問題是您的文件可能需要更長的時間才能編入 Elasticsearch。ElasticsearchStore 使用 Elasticsearch Bulk API,其中有一些預設值您可以調整以減少逾時錯誤的機率。

當您使用 SparseVectorRetrievalStrategy 時,這也是一個好主意。

預設值為

  • chunk_size: 500
  • max_chunk_bytes:100MB

要調整這些,您可以將 chunk_sizemax_chunk_bytes 參數傳遞給 ElasticsearchStore add_texts 方法。

    vector_store.add_texts(
texts,
bulk_kwargs={
"chunk_size": 50,
"max_chunk_bytes": 200000000
}
)

升級到 ElasticsearchStore

如果您已經在基於 langchain 的專案中使用 Elasticsearch,您可能正在使用舊的實現方式:ElasticVectorSearchElasticKNNSearch,這些實現方式現在已棄用。我們引入了一個新的實現方式,稱為 ElasticsearchStore,它更加靈活且易於使用。本筆記本將引導您完成升級到新實現方式的過程。

新功能?

新的實現方式現在是一個名為 ElasticsearchStore 的類別,可用於近似密集向量、精確密集向量、稀疏向量 (ELSER)、BM25 檢索和混合檢索,透過策略。

我正在使用 ElasticKNNSearch

舊的實現方式


from langchain_community.vectorstores.elastic_vector_search import ElasticKNNSearch

db = ElasticKNNSearch(
elasticsearch_url="https://127.0.0.1:9200",
index_name="test_index",
embedding=embedding
)

新的實現方式


from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy

db = ElasticsearchStore(
es_url="https://127.0.0.1:9200",
index_name="test_index",
embedding=embedding,
# if you use the model_id
# strategy=DenseVectorStrategy(model_id="test_model")
# if you use hybrid search
# strategy=DenseVectorStrategy(hybrid=True)
)

我正在使用 ElasticVectorSearch

舊的實現方式


from langchain_community.vectorstores.elastic_vector_search import ElasticVectorSearch

db = ElasticVectorSearch(
elasticsearch_url="https://127.0.0.1:9200",
index_name="test_index",
embedding=embedding
)

API 參考:ElasticVectorSearch

新的實現方式


from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy

db = ElasticsearchStore(
es_url="https://127.0.0.1:9200",
index_name="test_index",
embedding=embedding,
strategy=DenseVectorScriptScoreStrategy()
)

db.client.indices.delete(
index="test-metadata, test-elser, test-basic",
ignore_unavailable=True,
allow_no_indices=True,
)

API 參考

有關所有 ElasticSearchStore 功能和配置的詳細文檔,請前往 API 參考:https://langchain-python.dev.org.tw/api_reference/elasticsearch/vectorstores/langchain_elasticsearch.vectorstores.ElasticsearchStore.html


此頁面是否對您有幫助?