跳到主要內容

JaguarDB 向量資料庫

[JaguarDB 向量資料庫](http://www.jaguardb.com/windex.html

  1. 它是一個分散式向量資料庫
  2. JaguarDB 的 “ZeroMove” 功能實現了即時水平擴展
  3. 多模態:嵌入、文字、圖像、影片、PDF、音訊、時間序列和地理空間
  4. 全主節點:允許並行讀寫
  5. 異常偵測能力
  6. RAG 支援:結合 LLM 與專有和即時資料
  7. 共享元數據:跨多個向量索引共享元數據
  8. 距離度量:歐幾里得、餘弦、內積、曼哈頓、切比雪夫、漢明、傑卡德、閔可夫斯基

先決條件

執行此檔案中的範例有兩個要求。

  1. 您必須安裝並設定 JaguarDB 伺服器及其 HTTP 閘道伺服器。請參考以下網址的說明: www.jaguardb.com

  2. 您必須安裝 JaguarDB 的 http client 套件

        pip install -U jaguardb-http-client

RAG 與 Langchain

本節示範在 langchain 軟體堆疊中,如何結合 LLM 和 Jaguar 進行聊天。

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.jaguar import Jaguar
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

"""
Load a text file into a set of documents
"""
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=300)
docs = text_splitter.split_documents(documents)

"""
Instantiate a Jaguar vector store
"""
### Jaguar HTTP endpoint
url = "http://192.168.5.88:8080/fwww/"

### Use OpenAI embedding model
embeddings = OpenAIEmbeddings()

### Pod is a database for vectors
pod = "vdb"

### Vector store name
store = "langchain_rag_store"

### Vector index name
vector_index = "v"

### Type of the vector index
# cosine: distance metric
# fraction: embedding vectors are decimal numbers
# float: values stored with floating-point numbers
vector_type = "cosine_fraction_float"

### Dimension of each embedding vector
vector_dimension = 1536

### Instantiate a Jaguar store object
vectorstore = Jaguar(
pod, store, vector_index, vector_type, vector_dimension, url, embeddings
)

"""
Login must be performed to authorize the client.
The environment variable JAGUAR_API_KEY or file $HOME/.jagrc
should contain the API key for accessing JaguarDB servers.
"""
vectorstore.login()


"""
Create vector store on the JaguarDB database server.
This should be done only once.
"""
# Extra metadata fields for the vector store
metadata = "category char(16)"

# Number of characters for the text field of the store
text_size = 4096

# Create a vector store on the server
vectorstore.create(metadata, text_size)

"""
Add the texts from the text splitter to our vectorstore
"""
vectorstore.add_documents(docs)

""" Get the retriever object """
retriever = vectorstore.as_retriever()
# retriever = vectorstore.as_retriever(search_kwargs={"where": "m1='123' and m2='abc'"})

""" The retriever object can be used with LangChain and LLM """

與 Jaguar 向量儲存庫互動

使用者可以直接與 Jaguar 向量儲存庫互動,進行相似性搜尋和異常偵測。

from langchain_community.vectorstores.jaguar import Jaguar
from langchain_openai import OpenAIEmbeddings

# Instantiate a Jaguar vector store object
url = "http://192.168.3.88:8080/fwww/"
pod = "vdb"
store = "langchain_test_store"
vector_index = "v"
vector_type = "cosine_fraction_float"
vector_dimension = 10
embeddings = OpenAIEmbeddings()
vectorstore = Jaguar(
pod, store, vector_index, vector_type, vector_dimension, url, embeddings
)

# Login for authorization
vectorstore.login()

# Create the vector store with two metadata fields
# This needs to be run only once.
metadata_str = "author char(32), category char(16)"
vectorstore.create(metadata_str, 1024)

# Add a list of texts
texts = ["foo", "bar", "baz"]
metadatas = [
{"author": "Adam", "category": "Music"},
{"author": "Eve", "category": "Music"},
{"author": "John", "category": "History"},
]
ids = vectorstore.add_texts(texts=texts, metadatas=metadatas)

# Search similar text
output = vectorstore.similarity_search(
query="foo",
k=1,
metadatas=["author", "category"],
)
assert output[0].page_content == "foo"
assert output[0].metadata["author"] == "Adam"
assert output[0].metadata["category"] == "Music"
assert len(output) == 1

# Search with filtering (where)
where = "author='Eve'"
output = vectorstore.similarity_search(
query="foo",
k=3,
fetch_k=9,
where=where,
metadatas=["author", "category"],
)
assert output[0].page_content == "bar"
assert output[0].metadata["author"] == "Eve"
assert output[0].metadata["category"] == "Music"
assert len(output) == 1

# Anomaly detection
result = vectorstore.is_anomalous(
query="dogs can jump high",
)
assert result is False

# Remove all data in the store
vectorstore.clear()
assert vectorstore.count() == 0

# Remove the store completely
vectorstore.drop()

# Logout
vectorstore.logout()
API 參考:Jaguar | OpenAI 嵌入

此頁面是否有幫助?