跳至主要內容

PGVector

一個使用 postgres 作為後端並利用 pgvector 擴展的 LangChain 向量儲存抽象的實作。

程式碼位於一個名為 langchain_postgres 的整合套件中。

狀態 (Status)

此程式碼已從 langchain_community 移植到一個專用的套件,名為 langchain-postgres。已進行以下變更:

  • langchain_postgres 僅適用於 psycopg3。請將您的連線字串從 postgresql+psycopg2://... 更新為 postgresql+psycopg://langchain:langchain@...(是的,驅動程式名稱是 psycopg,而不是 psycopg3,但它會使用 psycopg3)。
  • 嵌入儲存和集合的架構已變更,以使 add_documents 可以正確地與使用者指定的 ID 協同運作。
  • 現在必須傳遞明確的連線物件。

目前,**沒有任何機制**支援輕鬆的資料遷移架構變更。因此,向量儲存中的任何架構變更都需要使用者重新建立表格並重新新增文件。如果這是您所擔心的,請使用不同的向量儲存。如果不是,則此實作應該適合您的使用案例。

設定 (Setup)

首先,下載合作夥伴套件

pip install -qU langchain_postgres

您可以執行以下命令來啟動一個具有 pgvector 擴展的 postgres 容器

%docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16

憑證 (Credentials)

執行此筆記本不需要任何憑證,只需確保您已下載 langchain_postgres 套件並正確啟動 postgres 容器。

如果您想要獲得一流的自動模型呼叫追蹤,您也可以透過取消註解下方內容來設定您的 LangSmith API 金鑰

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

實例化 (Instantiation)

pip install -qU langchain-openai
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector

# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" # Uses psycopg3!
collection_name = "my_docs"


vector_store = PGVector(
embeddings=embeddings,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
API 參考:Document

管理向量儲存 (Manage vector store)

將項目新增至向量儲存 (Add items to vector store)

請注意,依 ID 新增文件將會覆寫任何符合該 ID 的現有文件。

docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
),
Document(
page_content="the market also sells fresh oranges",
metadata={"id": 4, "location": "market", "topic": "food"},
),
Document(
page_content="the new art exhibit is fascinating",
metadata={"id": 5, "location": "museum", "topic": "art"},
),
Document(
page_content="a sculpture exhibit is also at the museum",
metadata={"id": 6, "location": "museum", "topic": "art"},
),
Document(
page_content="a new coffee shop opened on Main Street",
metadata={"id": 7, "location": "Main Street", "topic": "food"},
),
Document(
page_content="the book club meets at the library",
metadata={"id": 8, "location": "library", "topic": "reading"},
),
Document(
page_content="the library hosts a weekly story time for kids",
metadata={"id": 9, "location": "library", "topic": "reading"},
),
Document(
page_content="a cooking class for beginners is offered at the community center",
metadata={"id": 10, "location": "community center", "topic": "classes"},
),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

從向量儲存刪除項目 (Delete items from vector store)

vector_store.delete(ids=["3"])

查詢向量儲存 (Query vector store)

一旦建立您的向量儲存並新增相關文件,您很可能會希望在執行鏈或代理程式時查詢它。

篩選支援 (Filtering Support)

向量儲存支援一組可以針對文件的中繼資料欄位套用的篩選器。

運算子 (Operator)意義/類別 (Meaning/Category)
$eq相等 (==) (Equality (==))
$ne不等 (!=) (Inequality (!=))
$lt小於 (<) (Less than (<))
$lte小於或等於 (<=) (Less than or equal (<=))
$gt大於 (>) (Greater than (>))
$gte大於或等於 (>=) (Greater than or equal (>=))
$in特殊情況 (在...之中) (Special Cased (in))
$nin特殊情況 (不在...之中) (Special Cased (not in))
$between特殊情況 (在...之間) (Special Cased (between))
$like文字 (相似) (Text (like))
$ilike文字 (不區分大小寫的相似) (Text (case-insensitive like))
$and邏輯 (且) (Logical (and))
$or邏輯 (或) (Logical (or))

直接查詢 (Query directly)

可以按如下方式執行簡單的相似性搜尋

results = vector_store.similarity_search(
"kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]

如果您提供具有多個欄位的字典,但沒有運算子,則頂層將被解釋為邏輯 **AND** 篩選器

vector_store.similarity_search(
"ducks",
k=10,
filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]
vector_store.similarity_search(
"ducks",
k=10,
filter={
"$and": [
{"id": {"$in": [1, 5, 2, 9]}},
{"location": {"$in": ["pond", "market"]}},
]
},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

如果您想要執行相似性搜尋並接收相應的分數,您可以執行

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]

如需您可以在 PGVector 向量儲存上執行的不同搜尋的完整清單,請參閱 API 參考

透過轉換為檢索器來查詢 (Query by turning into retriever)

您也可以將向量儲存轉換為檢索器,以便在您的鏈中更輕鬆地使用。

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]

用於檢索增強生成的用法 (Usage for retrieval-augmented generation)

有關如何使用此向量儲存進行檢索增強生成 (RAG) 的指南,請參閱以下各節

API 參考 (API reference)

如需所有 __ModuleName__VectorStore 功能和配置的詳細文件,請前往 API 參考文檔:https://langchain-python.dev.org.tw/api_reference/postgres/vectorstores/langchain_postgres.vectorstores.PGVector.html


此頁面是否對您有幫助?