跳至主要內容

Amazon MemoryDB

向量搜尋簡介和 Langchain 整合指南。

什麼是 Amazon MemoryDB?

MemoryDB 與 Redis OSS 相容,Redis OSS 是一個受歡迎的開放原始碼資料儲存庫,讓您可以使用現有靈活且友善的 Redis OSS 資料結構、API 和命令,快速建構應用程式。 使用 MemoryDB,您的所有資料都儲存在記憶體中,讓您可以實現微秒級的讀取和個位數毫秒級的寫入延遲和高輸送量。 MemoryDB 還使用 Multi-AZ 交易日誌,跨多個可用區域 (AZ) 持久儲存資料,以實現快速容錯移轉、資料庫恢復和節點重新啟動。

MemoryDB 的向量搜尋

MemoryDB 的向量搜尋擴展了 MemoryDB 的功能。 向量搜尋可以與現有的 MemoryDB 功能結合使用。 不使用向量搜尋的應用程式不受其存在影響。 向量搜尋在所有 MemoryDB 可用的區域中都可用。 您可以使用現有的 MemoryDB 資料或 Redis OSS API 來建構機器學習和生成式 AI 用例,例如檢索增強生成、異常檢測、文件檢索和即時推薦。

  • 索引 Redis hashes 和 JSON 中的多個欄位
  • 向量相似度搜尋 (使用 HNSW (ANN) 或 FLAT (KNN))
  • 向量範圍搜尋 (例如,找到查詢向量半徑內的所有向量)
  • 無性能損失的增量索引

Setting up (設定)

Install Redis Python client (安裝 Redis Python 用戶端)

Redis-py 是一個 python 用戶端,可用於連接到 MemoryDB

%pip install --upgrade --quiet  redis langchain-aws
from langchain_aws.embeddings import BedrockEmbeddings

embeddings = BedrockEmbeddings()
API Reference (API 參考):BedrockEmbeddings

MemoryDB Connection (MemoryDB 連線)

Valid Redis Url schemas are (有效的 Redis Url 綱要為)

  1. redis:// - Connection to Redis cluster, unencrypted (連線到 Redis 叢集,未加密)
  2. rediss:// - Connection to Redis cluster, with TLS encryption (連線到 Redis 叢集,使用 TLS 加密)

More information about additional connection parameters can be found in the redis-py documentation (redis-py 文件).

Sample data (範例資料)

First we will describe some sample data so that the various attributes of the Redis vector store can be demonstrated. (首先,我們將描述一些範例資料,以便可以演示 Redis 向量儲存的各種屬性。)

metadata = [
{
"user": "john",
"age": 18,
"job": "engineer",
"credit_score": "high",
},
{
"user": "derrick",
"age": 45,
"job": "doctor",
"credit_score": "low",
},
{
"user": "nancy",
"age": 94,
"job": "doctor",
"credit_score": "high",
},
{
"user": "tyler",
"age": 100,
"job": "engineer",
"credit_score": "high",
},
{
"user": "joe",
"age": 35,
"job": "dentist",
"credit_score": "medium",
},
]
texts = ["foo", "foo", "foo", "bar", "bar"]
index_name = "users"

Create MemoryDB vector store (建立 MemoryDB 向量儲存)

The InMemoryVectorStore instance can be initialized using the below methods (InMemoryVectorStore 實例可以使用以下方法初始化)

  • InMemoryVectorStore.__init__ - Initialize directly (直接初始化)
  • InMemoryVectorStore.from_documents - Initialize from a list of Langchain.docstore.Document objects (從 Langchain.docstore.Document 物件清單初始化)
  • InMemoryVectorStore.from_texts - Initialize from a list of texts (optionally with metadata) (從文字清單初始化(可選擇包含元資料))
  • InMemoryVectorStore.from_existing_index - Initialize from an existing MemoryDB index (從現有的 MemoryDB 索引初始化)
from langchain_aws.vectorstores.inmemorydb import InMemoryVectorStore

vds = InMemoryVectorStore.from_texts(
embeddings,
redis_url="rediss://cluster_endpoint:6379/ssl=True ssl_cert_reqs=none",
)
API Reference (API 參考):InMemoryVectorStore
vds.index_name
'users'

Querying (查詢)

There are multiple ways to query the InMemoryVectorStore implementation based on what use case you have (根據您的用例,有多種方法可以查詢 InMemoryVectorStore 實作)

  • similarity_search: Find the most similar vectors to a given vector. (找到與給定向量最相似的向量。)
  • similarity_search_with_score: Find the most similar vectors to a given vector and return the vector distance (找到與給定向量最相似的向量,並傳回向量距離)
  • similarity_search_limit_score: Find the most similar vectors to a given vector and limit the number of results to the score_threshold (找到與給定向量最相似的向量,並將結果數量限制為 score_threshold)
  • similarity_search_with_relevance_scores: Find the most similar vectors to a given vector and return the vector similarities (找到與給定向量最相似的向量,並傳回向量相似性)
  • max_marginal_relevance_search: Find the most similar vectors to a given vector while also optimizing for diversity (找到與給定向量最相似的向量,同時優化多樣性)
results = vds.similarity_search("foo")
print(results[0].page_content)
foo
# with scores (distances)
results = vds.similarity_search_with_score("foo", k=5)
for result in results:
print(f"Content: {result[0].page_content} --- Score: {result[1]}")
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
Content: bar --- Score: 0.1566
Content: bar --- Score: 0.1566
# limit the vector distance that can be returned
results = vds.similarity_search_with_score("foo", k=5, distance_threshold=0.1)
for result in results:
print(f"Content: {result[0].page_content} --- Score: {result[1]}")
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
# with scores
results = vds.similarity_search_with_relevance_scores("foo", k=5)
for result in results:
print(f"Content: {result[0].page_content} --- Similiarity: {result[1]}")
Content: foo --- Similiarity: 1.0
Content: foo --- Similiarity: 1.0
Content: foo --- Similiarity: 1.0
Content: bar --- Similiarity: 0.8434
Content: bar --- Similiarity: 0.8434
# you can also add new documents as follows
new_document = ["baz"]
new_metadata = [{"user": "sam", "age": 50, "job": "janitor", "credit_score": "high"}]
# both the document and metadata must be lists
vds.add_texts(new_document, new_metadata)
['doc:users:b9c71d62a0a34241a37950b448dafd38']

MemoryDB as Retriever (MemoryDB 作為檢索器)

Here we go over different options for using the vector store as a retriever. (在這裡,我們介紹使用向量儲存作為檢索器的不同選項。)

There are three different search methods we can use to do retrieval. By default, it will use semantic similarity. (我們可以使用三種不同的搜尋方法來進行檢索。 預設情況下,它將使用語義相似性。)

query = "foo"
results = vds.similarity_search_with_score(query, k=3, return_metadata=True)

for result in results:
print("Content:", result[0].page_content, " --- Score: ", result[1])
Content: foo  --- Score:  0.0
Content: foo --- Score: 0.0
Content: foo --- Score: 0.0
retriever = vds.as_retriever(search_type="similarity", search_kwargs={"k": 4})
docs = retriever.invoke(query)
docs
[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),
Document(page_content='bar', metadata={'id': 'doc:users_modified:01ef6caac12b42c28ad870aefe574253', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'})]

There is also the similarity_distance_threshold retriever which allows the user to specify the vector distance (還有 similarity_distance_threshold 檢索器,允許使用者指定向量距離)

retriever = vds.as_retriever(
search_type="similarity_distance_threshold",
search_kwargs={"k": 4, "distance_threshold": 0.1},
)
docs = retriever.invoke(query)
docs
[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]

Lastly, the similarity_score_threshold allows the user to define the minimum score for similar documents (最後,similarity_score_threshold 允許使用者定義相似文件的最小分數)

retriever = vds.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": 0.9, "k": 10},
)
retriever.invoke("foo")
[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),
Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]
retriever.invoke("foo")
[Document(page_content='foo', metadata={'id': 'doc:users:8f6b673b390647809d510112cde01a27', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),
Document(page_content='bar', metadata={'id': 'doc:users:93521560735d42328b48c9c6f6418d6a', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'}),
Document(page_content='foo', metadata={'id': 'doc:users:125ecd39d07845eabf1a699d44134a5b', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),
Document(page_content='foo', metadata={'id': 'doc:users:d6200ab3764c466082fde3eaab972a2a', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'})]

Delete index (刪除索引)

To delete your entries you have to address them by their keys. (要刪除您的條目,您必須通過其鍵來尋址它們。)

# delete the indices too
InMemoryVectorStore.drop_index(
index_name="users", delete_documents=True, redis_url="redis://127.0.0.1:6379"
)
InMemoryVectorStore.drop_index(
index_name="users_modified",
delete_documents=True,
redis_url="redis://127.0.0.1:6379",
)
True

Was this page helpful? (此頁面是否對您有所幫助?)