跳至主要內容

Intel's Visual Data Management System (VDMS)

Intel 的 VDMS 是一個儲存解決方案,用於有效存取大型「視覺」資料,其目標是透過搜尋儲存為圖形的視覺元資料來尋找相關的視覺資料,並啟用對視覺資料進行機器友善的增強,以便更快地存取,從而實現雲端規模。VDMS 在 MIT 授權下授權。

VDMS 支援

  • K 最近鄰搜尋
  • 歐幾里得距離 (L2) 和內積 (IP)
  • 用於索引和計算距離的函式庫:TileDBDense、TileDBSparse、FaissFlat (預設)、FaissIVFFlat、Flinng
  • 文字、圖像和影片的嵌入
  • 向量和元資料搜尋

VDMS 具有伺服器和客戶端組件。若要設定伺服器,請參閱安裝說明或使用docker 映像檔

此筆記本示範如何使用 Docker 映像檔將 VDMS 用作向量儲存。

您需要使用 pip install -qU langchain-community 安裝 langchain-community 才能使用此整合

首先,安裝 VDMS 客戶端和 Sentence Transformers 的 Python 套件

# Pip install necessary package
%pip install --upgrade --quiet pip vdms sentence-transformers langchain-huggingface > /dev/null
Note: you may need to restart the kernel to use updated packages.

啟動 VDMS 伺服器

在這裡,我們使用連接埠 55555 啟動 VDMS 伺服器。

!docker run --rm -d -p 55555:55555 --name vdms_vs_test_nb intellabs/vdms:latest
b26917ffac236673ef1d035ab9c91fe999e29c9eb24aa6c7103d7baa6bf2f72d

基本範例 (使用 Docker 容器)

在此基本範例中,我們示範如何將文件新增到 VDMS 並將其用作向量資料庫。

您可以單獨在 Docker 容器中執行 VDMS 伺服器,以便與 LangChain 搭配使用,LangChain 會透過 VDMS Python 客戶端連線到伺服器。

VDMS 能夠處理多個文件集合,但 LangChain 介面期望只有一個,因此我們需要指定集合的名稱。 LangChain 使用的預設集合名稱為「langchain」。

import time
import warnings

warnings.filterwarnings("ignore")

from langchain_community.document_loaders.text import TextLoader
from langchain_community.vectorstores import VDMS
from langchain_community.vectorstores.vdms import VDMS_Client
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters.character import CharacterTextSplitter

time.sleep(2)
DELIMITER = "-" * 50

# Connect to VDMS Vector Store
vdms_client = VDMS_Client(host="localhost", port=55555)

以下是一些用於列印結果的輔助函數。

def print_document_details(doc):
print(f"Content:\n\t{doc.page_content}\n")
print("Metadata:")
for key, value in doc.metadata.items():
if value != "Missing property":
print(f"\t{key}:\t{value}")


def print_results(similarity_results, score=True):
print(f"{DELIMITER}\n")
if score:
for doc, score in similarity_results:
print(f"Score:\t{score}\n")
print_document_details(doc)
print(f"{DELIMITER}\n")
else:
for doc in similarity_results:
print_document_details(doc)
print(f"{DELIMITER}\n")


def print_response(list_of_entities):
for ent in list_of_entities:
for key, value in ent.items():
if value != "Missing property":
print(f"\n{key}:\n\t{value}")
print(f"{DELIMITER}\n")

載入文件並取得嵌入函數

在這裡,我們載入最新的國情咨文,並將文件分割成區塊。

LangChain 向量儲存使用字串/關鍵字 id 來記錄文件。 預設情況下,id 是一個 uuid,但這裡我們將其定義為強制轉換為字串的整數。 此外,文件還提供了額外的元資料,並且此範例使用 HuggingFaceEmbeddings 作為嵌入函數。

# load the document and split it into chunks
document_path = "../../how_to/state_of_the_union.txt"
raw_documents = TextLoader(document_path).load()

# split it into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(raw_documents)
ids = []
for doc_idx, doc in enumerate(docs):
ids.append(str(doc_idx + 1))
docs[doc_idx].metadata["id"] = str(doc_idx + 1)
docs[doc_idx].metadata["page_number"] = int(doc_idx + 1)
docs[doc_idx].metadata["president_included"] = (
"president" in doc.page_content.lower()
)
print(f"# Documents: {len(docs)}")


# create the open-source embedding function
model_name = "sentence-transformers/all-mpnet-base-v2"
embedding = HuggingFaceEmbeddings(model_name=model_name)
print(
f"# Embedding Dimensions: {len(embedding.embed_query('This is a test document.'))}"
)
# Documents: 42
# Embedding Dimensions: 768

使用 Faiss Flat 和歐幾里得距離 (預設) 進行相似性搜尋

在本節中,我們使用 FAISS IndexFlat 索引 (預設) 和 Euclidena 距離 (預設) 作為相似性搜尋的距離度量,將文件新增到 VDMS。 我們搜尋與查詢 What did the president say about Ketanji Brown Jackson 相關的三個文件 (k=3)。

# add data
collection_name = "my_collection_faiss_L2"
db_FaissFlat = VDMS.from_documents(
docs,
client=vdms_client,
ids=ids,
collection_name=collection_name,
embedding=embedding,
)

# Query (No metadata filtering)
k = 3
query = "What did the president say about Ketanji Brown Jackson"
returned_docs = db_FaissFlat.similarity_search(query, k=k, filter=None)
print_results(returned_docs, score=False)
--------------------------------------------------

Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Content:
As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit.

It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children.

And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care.

Third, support our veterans.

Veterans are the best of us.

I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.

My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.

Our troops in Iraq and Afghanistan faced many dangers.

Metadata:
id: 37
page_number: 37
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Content:
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.

Metadata:
id: 33
page_number: 33
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------
# Query (with filtering)
k = 3
constraints = {"page_number": [">", 30], "president_included": ["==", True]}
query = "What did the president say about Ketanji Brown Jackson"
returned_docs = db_FaissFlat.similarity_search(query, k=k, filter=constraints)
print_results(returned_docs, score=False)
--------------------------------------------------

Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Content:
And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong.

As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.

And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things.

So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.

First, beat the opioid epidemic.

Metadata:
id: 35
page_number: 35
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Content:
Last month, I announced our plan to supercharge
the Cancer Moonshot that President Obama asked me to lead six years ago.

Our goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases.

More support for patients and families.

To get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health.

It’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.

ARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more.

A unity agenda for the nation.

We can do this.

My fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy.

In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things.

We have fought for freedom, expanded liberty, defeated totalitarianism and terror.

Metadata:
id: 40
page_number: 40
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

使用 Faiss IVFFlat 和內積 (IP) 距離進行相似性搜尋

在本節中,我們使用 Faiss IndexIVFFlat 索引和 IP 作為相似性搜尋的距離度量,將文件新增到 VDMS。 我們搜尋與查詢 What did the president say about Ketanji Brown Jackson 相關的三個文件 (k=3),並且還傳回分數以及文件。

db_FaissIVFFlat = VDMS.from_documents(
docs,
client=vdms_client,
ids=ids,
collection_name="my_collection_FaissIVFFlat_IP",
embedding=embedding,
engine="FaissIVFFlat",
distance_strategy="IP",
)
# Query
k = 3
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db_FaissIVFFlat.similarity_search_with_score(query, k=k, filter=None)
print_results(docs_with_score)
--------------------------------------------------

Score: 1.2032090425

Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Score: 1.4952471256

Content:
As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit.

It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children.

And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care.

Third, support our veterans.

Veterans are the best of us.

I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.

My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.

Our troops in Iraq and Afghanistan faced many dangers.

Metadata:
id: 37
page_number: 37
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Score: 1.5008399487

Content:
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.

Metadata:
id: 33
page_number: 33
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

使用 FLINNG 和 IP 距離進行相似性搜尋

在本節中,我們使用「過濾器以識別近鄰群組」(FLINNG) 索引和 IP 作為相似性搜尋的距離度量,將文件新增到 VDMS。 我們搜尋與查詢 What did the president say about Ketanji Brown Jackson 相關的三個文件 (k=3),並且還傳回分數以及文件。

db_Flinng = VDMS.from_documents(
docs,
client=vdms_client,
ids=ids,
collection_name="my_collection_Flinng_IP",
embedding=embedding,
engine="Flinng",
distance_strategy="IP",
)
# Query
k = 3
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db_Flinng.similarity_search_with_score(query, k=k, filter=None)
print_results(docs_with_score)
--------------------------------------------------

Score: 1.2032090425

Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Score: 1.4952471256

Content:
As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit.

It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children.

And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care.

Third, support our veterans.

Veterans are the best of us.

I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.

My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.

Our troops in Iraq and Afghanistan faced many dangers.

Metadata:
id: 37
page_number: 37
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Score: 1.5008399487

Content:
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.

Metadata:
id: 33
page_number: 33
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

使用 TileDBDense 和歐幾里得距離進行相似性搜尋

在本節中,我們使用 TileDB Dense 索引和 L2 作為相似性搜尋的距離度量,將文件新增到 VDMS。 我們搜尋與查詢 What did the president say about Ketanji Brown Jackson 相關的三個文件 (k=3),並且還傳回分數以及文件。

db_tiledbD = VDMS.from_documents(
docs,
client=vdms_client,
ids=ids,
collection_name="my_collection_tiledbD_L2",
embedding=embedding,
engine="TileDBDense",
distance_strategy="L2",
)

k = 3
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db_tiledbD.similarity_search_with_score(query, k=k, filter=None)
print_results(docs_with_score)
--------------------------------------------------

Score: 1.2032090425

Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Score: 1.4952471256

Content:
As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit.

It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children.

And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care.

Third, support our veterans.

Veterans are the best of us.

I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.

My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.

Our troops in Iraq and Afghanistan faced many dangers.

Metadata:
id: 37
page_number: 37
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Score: 1.5008399487

Content:
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.

Metadata:
id: 33
page_number: 33
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

更新和刪除

在構建實際應用程式的過程中,您需要超越新增資料,並且還需要更新和刪除資料。

這是一個示範如何執行此操作的基本範例。 首先,我們將透過新增日期來更新與查詢最相關的文件之元資料。

from datetime import datetime

doc = db_FaissFlat.similarity_search(query)[0]
print(f"Original metadata: \n\t{doc.metadata}")

# Update the metadata for a document by adding last datetime document read
datetime_str = datetime(2024, 5, 1, 14, 30, 0).isoformat()
doc.metadata["last_date_read"] = {"_date": datetime_str}
print(f"new metadata: \n\t{doc.metadata}")
print(f"{DELIMITER}\n")

# Update document in VDMS
id_to_update = doc.metadata["id"]
db_FaissFlat.update_document(collection_name, id_to_update, doc)
response, response_array = db_FaissFlat.get(
collection_name,
constraints={
"id": ["==", id_to_update],
"last_date_read": [">=", {"_date": "2024-05-01T00:00:00"}],
},
)

# Display Results
print(f"UPDATED ENTRY (id={id_to_update}):")
print_response([response[0]["FindDescriptor"]["entities"][0]])
Original metadata: 
{'id': '32', 'page_number': 32, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}
new metadata:
{'id': '32', 'page_number': 32, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt', 'last_date_read': {'_date': '2024-05-01T14:30:00'}}
--------------------------------------------------

UPDATED ENTRY (id=32):

content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

id:
32

last_date_read:
2024-05-01T14:30:00+00:00

page_number:
32

president_included:
True

source:
../../how_to/state_of_the_union.txt
--------------------------------------------------

接下來,我們將按 ID (id=42) 刪除最後一個文件。

print("Documents before deletion: ", db_FaissFlat.count(collection_name))

id_to_remove = ids[-1]
db_FaissFlat.delete(collection_name=collection_name, ids=[id_to_remove])
print(
f"Documents after deletion (id={id_to_remove}): {db_FaissFlat.count(collection_name)}"
)
Documents before deletion:  42
Documents after deletion (id=42): 41

其他資訊

VDMS 支援各種視覺資料類型和操作。 某些功能已整合到 LangChain 介面中,但隨著 VDMS 的持續開發,將會新增額外的工作流程改進。

以下是整合到 LangChain 中的其他功能。

透過向量進行相似度搜尋

除了透過字串查詢外,您也可以透過嵌入 (embedding)/向量進行搜尋。

embedding_vector = embedding.embed_query(query)
returned_docs = db_FaissFlat.similarity_search_by_vector(embedding_vector)

# Print Results
print_document_details(returned_docs[0])
Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
last_date_read: 2024-05-01T14:30:00+00:00
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt

根據元數據篩選

在處理集合之前縮小範圍會很有幫助。

例如,可以使用 get 方法根據元數據篩選集合。字典用於篩選元數據。 在這裡,我們檢索 id = 2 的文檔,並將其從向量儲存中移除。

response, response_array = db_FaissFlat.get(
collection_name,
limit=1,
include=["metadata", "embeddings"],
constraints={"id": ["==", "2"]},
)

# Delete id=2
db_FaissFlat.delete(collection_name=collection_name, ids=["2"])

print("Deleted entry:")
print_response([response[0]["FindDescriptor"]["entities"][0]])
Deleted entry:

blob:
True

content:
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people.

Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.

They keep moving.

And the costs and the threats to America and the world keep rising.

That’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2.

The United States is a member along with 29 other nations.

It matters. American diplomacy matters. American resolve matters.

id:
2

page_number:
2

president_included:
True

source:
../../how_to/state_of_the_union.txt
--------------------------------------------------

Retriever 選項

本節介紹如何將 VDMS 用作 Retriever 的不同選項。

這裡我們在 Retriever 物件中使用相似度搜尋。

retriever = db_FaissFlat.as_retriever()
relevant_docs = retriever.invoke(query)[0]

print_document_details(relevant_docs)
Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
last_date_read: 2024-05-01T14:30:00+00:00
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt

最大邊際相關性搜尋 (MMR)

除了在 Retriever 物件中使用相似度搜尋外,您還可以使用 mmr

retriever = db_FaissFlat.as_retriever(search_type="mmr")
relevant_docs = retriever.invoke(query)[0]

print_document_details(relevant_docs)
Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
last_date_read: 2024-05-01T14:30:00+00:00
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt

我們也可以直接使用 MMR。

mmr_resp = db_FaissFlat.max_marginal_relevance_search_with_score(query, k=2, fetch_k=10)
print_results(mmr_resp)
--------------------------------------------------

Score: 1.2032091618

Content:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

Metadata:
id: 32
last_date_read: 2024-05-01T14:30:00+00:00
page_number: 32
president_included: True
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

Score: 1.50705266

Content:
But cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body.

Danielle says Heath was a fighter to the very end.

He didn’t know how to stop fighting, and neither did she.

Through her pain she found purpose to demand we do better.

Tonight, Danielle—we are.

The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits.

And tonight, I’m announcing we’re expanding eligibility to veterans suffering from nine respiratory cancers.

I’m also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve.

And fourth, let’s end cancer as we know it.

This is personal to me and Jill, to Kamala, and to so many of you.

Cancer is the #2 cause of death in America–second only to heart disease.

Metadata:
id: 39
page_number: 39
president_included: False
source: ../../how_to/state_of_the_union.txt
--------------------------------------------------

刪除集合

先前,我們根據文檔的 id 移除文檔。 在此,由於未提供 ID,因此會移除所有文檔。

print("Documents before deletion: ", db_FaissFlat.count(collection_name))

db_FaissFlat.delete(collection_name=collection_name)

print("Documents after deletion: ", db_FaissFlat.count(collection_name))
Documents before deletion:  40
Documents after deletion: 0

停止 VDMS 伺服器

!docker kill vdms_vs_test_nb
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
``````output
vdms_vs_test_nb

此頁面是否對您有幫助?