跳至主要內容

SAP HANA Cloud Vector Engine

SAP HANA Cloud Vector Engine 是一個完全整合到 SAP HANA Cloud 資料庫中的向量儲存。

您需要使用 pip install -qU langchain-community 安裝 langchain-community 才能使用此整合

設定

安裝 HANA 資料庫驅動程式。

# Pip install necessary package
%pip install --upgrade --quiet hdbcli

對於 OpenAIEmbeddings,我們使用環境中的 OpenAI API 金鑰。

import os
# Use OPENAI_API_KEY env variable
# os.environ["OPENAI_API_KEY"] = "Your OpenAI API key"

建立與 HANA Cloud 實例的資料庫連線。

from dotenv import load_dotenv
from hdbcli import dbapi

load_dotenv()
# Use connection settings from the environment
connection = dbapi.connect(
address=os.environ.get("HANA_DB_ADDRESS"),
port=os.environ.get("HANA_DB_PORT"),
user=os.environ.get("HANA_DB_USER"),
password=os.environ.get("HANA_DB_PASSWORD"),
autocommit=True,
sslValidateCertificate=False,
)

範例

載入範例文件 "state_of_the_union.txt" 並從中建立區塊。

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.hanavector import HanaDB
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

text_documents = TextLoader("../../how_to/state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
text_chunks = text_splitter.split_documents(text_documents)
print(f"Number of document chunks: {len(text_chunks)}")

embeddings = OpenAIEmbeddings()
Number of document chunks: 88

為 HANA 資料庫建立 LangChain VectorStore 介面,並指定用於存取向量嵌入的表格(集合)

db = HanaDB(
embedding=embeddings, connection=connection, table_name="STATE_OF_THE_UNION"
)

將載入的文件區塊新增至表格。對於此範例,我們刪除表格中可能存在的先前執行中的任何先前內容。

# Delete already existing documents from the table
db.delete(filter={})

# add the loaded document chunks
db.add_documents(text_chunks)
[]

執行查詢,以從上一步新增的文件區塊中取得兩個最佳匹配的文件區塊。預設情況下,“餘弦相似度”用於搜尋。

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)

for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.

使用 "歐幾里得距離" 查詢相同的內容。結果應與 "餘弦相似度" 的結果相同。

from langchain_community.vectorstores.utils import DistanceStrategy

db = HanaDB(
embedding=embeddings,
connection=connection,
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
table_name="STATE_OF_THE_UNION",
)

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)
for doc in docs:
print("-" * 80)
print(doc.page_content)
API 參考:DistanceStrategy
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.

最大邊際相關性搜尋 (MMR)

Maximal marginal relevance 可最佳化與查詢的相似性以及所選文件之間的多樣性。將從資料庫中檢索前 20 個 (fetch_k) 項目。然後,MMR 演算法將找到最佳的 2 個 (k) 匹配項。

docs = db.max_marginal_relevance_search(query, k=2, fetch_k=20)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

建立 HNSW 向量索引

向量索引可以顯著加速向量的 top-k 最近鄰查詢。使用者可以使用 create_hnsw_index 函數建立分層可導航小世界 (HNSW) 向量索引。

有關在資料庫層級建立索引的更多資訊,請參閱官方文件

# HanaDB instance uses cosine similarity as default:
db_cosine = HanaDB(
embedding=embeddings, connection=connection, table_name="STATE_OF_THE_UNION"
)

# Attempting to create the HNSW index with default parameters
db_cosine.create_hnsw_index() # If no other parameters are specified, the default values will be used
# Default values: m=64, ef_construction=128, ef_search=200
# The default index name will be: STATE_OF_THE_UNION_COSINE_SIMILARITY_IDX (verify this naming pattern in HanaDB class)


# Creating a HanaDB instance with L2 distance as the similarity function and defined values
db_l2 = HanaDB(
embedding=embeddings,
connection=connection,
table_name="STATE_OF_THE_UNION",
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE, # Specify L2 distance
)

# This will create an index based on L2 distance strategy.
db_l2.create_hnsw_index(
index_name="STATE_OF_THE_UNION_L2_index",
m=100, # Max number of neighbors per graph node (valid range: 4 to 1000)
ef_construction=200, # Max number of candidates during graph construction (valid range: 1 to 100000)
ef_search=500, # Min number of candidates during the search (valid range: 1 to 100000)
)

# Use L2 index to perform MMR
docs = db_l2.max_marginal_relevance_search(query, k=2, fetch_k=20)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

重點:

  • 相似性函數:索引的相似性函數預設為餘弦相似性。如果要使用不同的相似性函數(例如,L2 距離),則需要在初始化 HanaDB 實例時指定它。
  • 預設參數:在 create_hnsw_index 函數中,如果使用者未提供 mef_constructionef_search 等參數的自訂值,則會自動使用預設值(例如,m=64ef_construction=128ef_search=200)。這些值可確保在無需使用者干預的情況下以合理的效能建立索引。

基本 Vectorstore 操作

db = HanaDB(
connection=connection, embedding=embeddings, table_name="LANGCHAIN_DEMO_BASIC"
)

# Delete already existing documents from the table
db.delete(filter={})
True

我們可以將簡單的文字文件新增至現有表格。

docs = [Document(page_content="Some text"), Document(page_content="Other docs")]
db.add_documents(docs)
[]

新增具有中繼資料的文件。

docs = [
Document(
page_content="foo",
metadata={"start": 100, "end": 150, "doc_name": "foo.txt", "quality": "bad"},
),
Document(
page_content="bar",
metadata={"start": 200, "end": 250, "doc_name": "bar.txt", "quality": "good"},
),
]
db.add_documents(docs)
[]

查詢具有特定中繼資料的文件。

docs = db.similarity_search("foobar", k=2, filter={"quality": "bad"})
# With filtering on "quality"=="bad", only one document should be returned
for doc in docs:
print("-" * 80)
print(doc.page_content)
print(doc.metadata)
--------------------------------------------------------------------------------
foo
{'start': 100, 'end': 150, 'doc_name': 'foo.txt', 'quality': 'bad'}

刪除具有特定中繼資料的文件。

db.delete(filter={"quality": "bad"})

# Now the similarity search with the same filter will return no results
docs = db.similarity_search("foobar", k=2, filter={"quality": "bad"})
print(len(docs))
0

進階篩選

除了基於基本值的篩選功能外,還可以使用更進階的篩選。下表顯示了可用的篩選運算子。

運算子語意
$eq相等 (==)
$ne不等 (!=)
$lt小於 (<)
$lte小於或等於 (<=)
$gt大於 (>)
$gte大於或等於 (>=)
$in包含在一組給定值中 (in)
$nin不包含在一組給定值中 (not in)
$between介於兩個邊界值範圍之間
$like基於 SQL 中 "LIKE" 語意的文字相等性(使用 "%" 作為萬用字元)
$and邏輯 "and",支援 2 個或更多運算元
$or邏輯 "or",支援 2 個或更多運算元
# Prepare some test documents
docs = [
Document(
page_content="First",
metadata={"name": "adam", "is_active": True, "id": 1, "height": 10.0},
),
Document(
page_content="Second",
metadata={"name": "bob", "is_active": False, "id": 2, "height": 5.7},
),
Document(
page_content="Third",
metadata={"name": "jane", "is_active": True, "id": 3, "height": 2.4},
),
]

db = HanaDB(
connection=connection,
embedding=embeddings,
table_name="LANGCHAIN_DEMO_ADVANCED_FILTER",
)

# Delete already existing documents from the table
db.delete(filter={})
db.add_documents(docs)


# Helper function for printing filter results
def print_filter_result(result):
if len(result) == 0:
print("<empty result>")
for doc in result:
print(doc.metadata)

使用 $ne$gt$gte$lt$lte 進行篩選

advanced_filter = {"id": {"$ne": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$gt": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$gte": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$lt": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$lte": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'id': {'$ne': 1}}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}
Filter: {'id': {'$gt': 1}}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}
Filter: {'id': {'$gte': 1}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}
Filter: {'id': {'$lt': 1}}
<empty result>
Filter: {'id': {'$lte': 1}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}

使用 $between$in$nin 進行篩選

advanced_filter = {"id": {"$between": (1, 2)}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$in": ["adam", "bob"]}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$nin": ["adam", "bob"]}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'id': {'$between': (1, 2)}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'name': {'$in': ['adam', 'bob']}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'name': {'$nin': ['adam', 'bob']}}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}

使用 $like 進行文字篩選

advanced_filter = {"name": {"$like": "a%"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$like": "%a%"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'name': {'$like': 'a%'}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
Filter: {'name': {'$like': '%a%'}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}

使用 $and$or 進行組合篩選

advanced_filter = {"$or": [{"id": 1}, {"name": "bob"}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"$and": [{"id": 1}, {"id": 2}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"$or": [{"id": 1}, {"id": 2}, {"id": 3}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'$or': [{'id': 1}, {'name': 'bob'}]}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'$and': [{'id': 1}, {'id': 2}]}
<empty result>
Filter: {'$or': [{'id': 1}, {'id': 2}, {'id': 3}]}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}

在用於檢索增強生成的鏈中使用 VectorStore 作為檢索器 (RAG)

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI

# Access the vector DB with a new table
db = HanaDB(
connection=connection,
embedding=embeddings,
table_name="LANGCHAIN_DEMO_RETRIEVAL_CHAIN",
)

# Delete already existing entries from the table
db.delete(filter={})

# add the loaded document chunks from the "State Of The Union" file
db.add_documents(text_chunks)

# Create a retriever instance of the vector store
retriever = db.as_retriever()

定義提示。

from langchain_core.prompts import PromptTemplate

prompt_template = """
You are an expert in state of the union topics. You are provided multiple context items that are related to the prompt you have to answer.
Use the following pieces of context to answer the question at the end.

'''
{context}
'''

Question: {question}
"""

PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}
API 參考:PromptTemplate

建立 ConversationalRetrievalChain,它會處理聊天記錄並檢索要新增至提示的相似文件區塊。

from langchain.chains import ConversationalRetrievalChain

llm = ChatOpenAI(model="gpt-3.5-turbo")
memory = ConversationBufferMemory(
memory_key="chat_history", output_key="answer", return_messages=True
)
qa_chain = ConversationalRetrievalChain.from_llm(
llm,
db.as_retriever(search_kwargs={"k": 5}),
return_source_documents=True,
memory=memory,
verbose=False,
combine_docs_chain_kwargs={"prompt": PROMPT},
)

提出第一個問題(並驗證使用了多少個文字區塊)。

question = "What about Mexico and Guatemala?"

result = qa_chain.invoke({"question": question})
print("Answer from LLM:")
print("================")
print(result["answer"])

source_docs = result["source_documents"]
print("================")
print(f"Number of used source document chunks: {len(source_docs)}")
Answer from LLM:
================
The United States has set up joint patrols with Mexico and Guatemala to catch more human traffickers. This collaboration is part of the efforts to address immigration issues and secure the borders in the region.
================
Number of used source document chunks: 5

詳細檢查鏈中使用的區塊。檢查排名最高的區塊是否包含問題中提到的有關 "墨西哥和瓜地馬拉" 的資訊。

for doc in source_docs:
print("-" * 80)
print(doc.page_content)
print(doc.metadata)

在同一個對話鏈中提出另一個問題。答案應與先前的答案相關。

question = "What about other countries?"

result = qa_chain.invoke({"question": question})
print("Answer from LLM:")
print("================")
print(result["answer"])
Answer from LLM:
================
Mexico and Guatemala are involved in joint patrols to catch human traffickers.

標準表格與具有向量資料的 "自訂" 表格

依照預設行為,將使用 3 個欄位建立嵌入的表格

  • 欄位 VEC_TEXT,其中包含文件的文字
  • 欄位 VEC_META,其中包含文件的中繼資料
  • 欄位 VEC_VECTOR,其中包含文件文字的嵌入向量
# Access the vector DB with a new table
db = HanaDB(
connection=connection, embedding=embeddings, table_name="LANGCHAIN_DEMO_NEW_TABLE"
)

# Delete already existing entries from the table
db.delete(filter={})

# Add a simple document with some metadata
docs = [
Document(
page_content="A simple document",
metadata={"start": 100, "end": 150, "doc_name": "simple.txt"},
)
]
db.add_documents(docs)
[]

顯示表格 "LANGCHAIN_DEMO_NEW_TABLE" 中的欄位

cur = connection.cursor()
cur.execute(
"SELECT COLUMN_NAME, DATA_TYPE_NAME FROM SYS.TABLE_COLUMNS WHERE SCHEMA_NAME = CURRENT_SCHEMA AND TABLE_NAME = 'LANGCHAIN_DEMO_NEW_TABLE'"
)
rows = cur.fetchall()
for row in rows:
print(row)
cur.close()
('VEC_META', 'NCLOB')
('VEC_TEXT', 'NCLOB')
('VEC_VECTOR', 'REAL_VECTOR')

在三個欄位中顯示插入的文件的值

cur = connection.cursor()
cur.execute(
"SELECT VEC_TEXT, VEC_META, TO_NVARCHAR(VEC_VECTOR) FROM LANGCHAIN_DEMO_NEW_TABLE LIMIT 1"
)
rows = cur.fetchall()
print(rows[0][0]) # The text
print(rows[0][1]) # The metadata
print(rows[0][2]) # The vector
cur.close()

自定義表格必須至少包含三個欄位,以符合標準表格的語意

  • 一個類型為 NCLOBNVARCHAR 的欄位,用於儲存嵌入的文字/上下文
  • 一個類型為 NCLOBNVARCHAR 的欄位,用於儲存元數據
  • 一個類型為 REAL_VECTOR 的欄位,用於儲存嵌入向量

該表格可以包含額外的欄位。當新的文件被插入到表格中時,這些額外的欄位必須允許 NULL 值。

# Create a new table "MY_OWN_TABLE_ADD" with three "standard" columns and one additional column
my_own_table_name = "MY_OWN_TABLE_ADD"
cur = connection.cursor()
cur.execute(
(
f"CREATE TABLE {my_own_table_name} ("
"SOME_OTHER_COLUMN NVARCHAR(42), "
"MY_TEXT NVARCHAR(2048), "
"MY_METADATA NVARCHAR(1024), "
"MY_VECTOR REAL_VECTOR )"
)
)

# Create a HanaDB instance with the own table
db = HanaDB(
connection=connection,
embedding=embeddings,
table_name=my_own_table_name,
content_column="MY_TEXT",
metadata_column="MY_METADATA",
vector_column="MY_VECTOR",
)

# Add a simple document with some metadata
docs = [
Document(
page_content="Some other text",
metadata={"start": 400, "end": 450, "doc_name": "other.txt"},
)
]
db.add_documents(docs)

# Check if data has been inserted into our own table
cur.execute(f"SELECT * FROM {my_own_table_name} LIMIT 1")
rows = cur.fetchall()
print(rows[0][0]) # Value of column "SOME_OTHER_DATA". Should be NULL/None
print(rows[0][1]) # The text
print(rows[0][2]) # The metadata
print(rows[0][3]) # The vector

cur.close()
None
Some other text
{"start": 400, "end": 450, "doc_name": "other.txt"}
<memory at 0x7f5edcb18d00>

新增另一個文件,並在自定義表格上執行相似性搜尋。

docs = [
Document(
page_content="Some more text",
metadata={"start": 800, "end": 950, "doc_name": "more.txt"},
)
]
db.add_documents(docs)

query = "What's up?"
docs = db.similarity_search(query, k=2)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
Some other text
--------------------------------------------------------------------------------
Some more text

使用自定義欄位優化過濾效能

為了允許彈性的元數據值,預設情況下,所有元數據都以 JSON 格式儲存在元數據欄位中。如果某些使用的元數據鍵和值類型是已知的,可以通過創建目標表格,將鍵名作為欄位名稱,並將其通過 specific_metadata_columns 列表傳遞給 HanaDB 建構函式,將它們儲存在額外的欄位中。在插入期間,匹配這些值的元數據鍵會被複製到特殊欄位中。對於 specific_metadata_columns 列表中的鍵,過濾器會使用這些特殊欄位,而不是元數據 JSON 欄位。

# Create a new table "PERFORMANT_CUSTOMTEXT_FILTER" with three "standard" columns and one additional column
my_own_table_name = "PERFORMANT_CUSTOMTEXT_FILTER"
cur = connection.cursor()
cur.execute(
(
f"CREATE TABLE {my_own_table_name} ("
"CUSTOMTEXT NVARCHAR(500), "
"MY_TEXT NVARCHAR(2048), "
"MY_METADATA NVARCHAR(1024), "
"MY_VECTOR REAL_VECTOR )"
)
)

# Create a HanaDB instance with the own table
db = HanaDB(
connection=connection,
embedding=embeddings,
table_name=my_own_table_name,
content_column="MY_TEXT",
metadata_column="MY_METADATA",
vector_column="MY_VECTOR",
specific_metadata_columns=["CUSTOMTEXT"],
)

# Add a simple document with some metadata
docs = [
Document(
page_content="Some other text",
metadata={
"start": 400,
"end": 450,
"doc_name": "other.txt",
"CUSTOMTEXT": "Filters on this value are very performant",
},
)
]
db.add_documents(docs)

# Check if data has been inserted into our own table
cur.execute(f"SELECT * FROM {my_own_table_name} LIMIT 1")
rows = cur.fetchall()
print(
rows[0][0]
) # Value of column "CUSTOMTEXT". Should be "Filters on this value are very performant"
print(rows[0][1]) # The text
print(
rows[0][2]
) # The metadata without the "CUSTOMTEXT" data, as this is extracted into a sperate column
print(rows[0][3]) # The vector

cur.close()
Filters on this value are very performant
Some other text
{"start": 400, "end": 450, "doc_name": "other.txt", "CUSTOMTEXT": "Filters on this value are very performant"}
<memory at 0x7f5edcb193c0>

這些特殊欄位對於 langchain 界面的其餘部分是完全透明的。一切都像以前一樣運作,只是效能更高。

docs = [
Document(
page_content="Some more text",
metadata={
"start": 800,
"end": 950,
"doc_name": "more.txt",
"CUSTOMTEXT": "Another customtext value",
},
)
]
db.add_documents(docs)

advanced_filter = {"CUSTOMTEXT": {"$like": "%value%"}}
query = "What's up?"
docs = db.similarity_search(query, k=2, filter=advanced_filter)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
Some other text
--------------------------------------------------------------------------------
Some more text

此頁面是否對您有幫助?