SemaDB

來自 SemaFind 的 SemaDB 是一個簡潔的向量相似度資料庫，用於構建 AI 應用程式。託管的 SemaDB Cloud 提供簡潔的開發人員體驗，讓您輕鬆入門。

完整的 API 文件以及範例和互動式遊樂場可在 RapidAPI 上找到。

這個筆記本示範了 SemaDB Cloud 向量資料庫的用法。

您需要使用 pip install -qU langchain-community 安裝 langchain-community 才能使用此整合

載入文件嵌入

為了在本地端執行，我們使用 Sentence Transformers，它通常用於嵌入句子。您可以使用 LangChain 提供的任何嵌入模型。

%pip install --upgrade --quiet  sentence_transformers

from langchain_huggingface import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

API 參考：HuggingFaceEmbeddings

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
print(len(docs))

API 參考：TextLoader | CharacterTextSplitter

連接到 SemaDB

SemaDB Cloud 使用 RapidAPI 金鑰進行身份驗證。您可以通過創建免費的 RapidAPI 帳戶來獲取金鑰。

import getpass
import os

if "SEMADB_API_KEY" not in os.environ:
    os.environ["SEMADB_API_KEY"] = getpass.getpass("SemaDB API Key:")

SemaDB API Key: ········

from langchain_community.vectorstores import SemaDB
from langchain_community.vectorstores.utils import DistanceStrategy

API 參考：SemaDB | DistanceStrategy

SemaDB 向量資料庫的參數直接反映了 API

"mycollection"：是我們將在其中儲存這些向量的集合名稱。
768：是向量的維度。在我們的例子中，句子轉換器嵌入產生 768 維向量。
API_KEY：是您的 RapidAPI 金鑰。
embeddings：對應於如何產生文件、文本和查詢的嵌入。
DistanceStrategy：是使用的距離度量。如果使用 COSINE，則封裝器會自動標準化向量。

db = SemaDB("mycollection", 768, embeddings, DistanceStrategy.COSINE)

# Create collection if running for the first time. If the collection
# already exists this will fail.
db.create_collection()

True

SemaDB 向量資料庫封裝器將文件文本作為點元數據添加，以便稍後收集。不建議儲存大量文本塊。如果您正在索引大型集合，我們建議改為儲存對文檔的參考，例如外部 ID。

db.add_documents(docs)[:2]

['813c7ef3-9797-466b-8afa-587115592c6c',
 'fc392f7f-082b-4932-bfcc-06800db5e017']

相似度搜尋

我們使用預設的 LangChain 相似度搜尋介面來搜尋最相似的句子。

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

docs = db.similarity_search_with_score(query)
docs[0]

(Document(page_content='And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../how_to/state_of_the_union.txt', 'text': 'And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.'}),
 0.42369342)

清理

您可以刪除集合以移除所有資料。

db.delete_collection()

True

向量資料庫概念指南
向量資料庫操作指南

載入文件嵌入​

連接到 SemaDB​

相似度搜尋​

清理​

相關​

此頁面是否有幫助？

載入文件嵌入

連接到 SemaDB

相似度搜尋

清理

相關