Astra DB (Cassandra)

DataStax Astra DB 是一個以 Cassandra 為基礎建構的無伺服器、具備向量功能的資料庫，並透過易於使用的 JSON API 方便地提供使用。

在逐步解說中，我們將示範搭配 Astra DB 向量儲存庫的 SelfQueryRetriever。

建立 Astra DB 向量儲存庫

首先，我們會想要建立一個 Astra DB VectorStore 並使用一些資料來初始化它。我們建立了一個包含電影摘要的小型示範文件集。

注意：自我查詢檢索器需要您安裝 lark (pip install lark)。我們也需要 astrapy 套件。

%pip install --upgrade --quiet lark astrapy langchain-openai

我們想要使用 OpenAIEmbeddings，因此我們必須取得 OpenAI API 金鑰。

import os
from getpass import getpass

from langchain_openai.embeddings import OpenAIEmbeddings

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key:")

embeddings = OpenAIEmbeddings()

API 參考：OpenAIEmbeddings

建立 Astra DB VectorStore

API 端點看起來像 https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com
Token 看起來像 AstraCS:6gBhNmsk135....

ASTRA_DB_API_ENDPOINT = input("ASTRA_DB_API_ENDPOINT = ")
ASTRA_DB_APPLICATION_TOKEN = getpass("ASTRA_DB_APPLICATION_TOKEN = ")

from langchain_community.vectorstores import AstraDB
from langchain_core.documents import Document

docs = [
    Document(
        page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
        metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
    ),
    Document(
        page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
        metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
    ),
    Document(
        page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
        metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
    ),
    Document(
        page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
        metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
    ),
    Document(
        page_content="Toys come alive and have a blast doing so",
        metadata={"year": 1995, "genre": "animated"},
    ),
    Document(
        page_content="Three men walk into the Zone, three men walk out of the Zone",
        metadata={
            "year": 1979,
            "director": "Andrei Tarkovsky",
            "genre": "science fiction",
            "rating": 9.9,
        },
    ),
]

vectorstore = AstraDB.from_documents(
    docs,
    embeddings,
    collection_name="astra_self_query_demo",
    api_endpoint=ASTRA_DB_API_ENDPOINT,
    token=ASTRA_DB_APPLICATION_TOKEN,
)

API 參考：AstraDB | Document

建立我們的自我查詢檢索器

現在我們可以實例化我們的檢索器。若要執行此操作，我們需要預先提供一些關於我們的文件支援的中繼資料欄位資訊，以及文件內容的簡短描述。

from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import OpenAI

metadata_field_info = [
    AttributeInfo(
        name="genre",
        description="The genre of the movie",
        type="string or list[string]",
    ),
    AttributeInfo(
        name="year",
        description="The year the movie was released",
        type="integer",
    ),
    AttributeInfo(
        name="director",
        description="The name of the movie director",
        type="string",
    ),
    AttributeInfo(
        name="rating", description="A 1-10 rating for the movie", type="float"
    ),
]
document_content_description = "Brief summary of a movie"
llm = OpenAI(temperature=0)

retriever = SelfQueryRetriever.from_llm(
    llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)

API 參考：AttributeInfo | SelfQueryRetriever | OpenAI

測試它

現在我們可以嘗試實際使用我們的檢索器！

# This example only specifies a relevant query
retriever.invoke("What are some movies about dinosaurs?")

# This example specifies a filter
retriever.invoke("I want to watch a movie rated higher than 8.5")

# This example only specifies a query and a filter
retriever.invoke("Has Greta Gerwig directed any movies about women")

# This example specifies a composite filter
retriever.invoke("What's a highly rated (above 8.5), science fiction movie ?")

# This example specifies a query and composite filter
retriever.invoke(
    "What's a movie about toys after 1990 but before 2005, and is animated"
)

篩選 k

我們也可以使用自我查詢檢索器來指定 k：要提取的文件數量。

我們可以透過將 enable_limit=True 傳遞給建構函式來完成此操作。

retriever = SelfQueryRetriever.from_llm(
    llm,
    vectorstore,
    document_content_description,
    metadata_field_info,
    verbose=True,
    enable_limit=True,
)

# This example only specifies a relevant query
retriever.invoke("What are two movies about dinosaurs?")

清除

如果您想要從您的 Astra DB 執行個體中完全刪除集合，請執行此操作。

(您將遺失儲存在其中的資料。)

vectorstore.delete_collection()

建立 Astra DB 向量儲存庫​

建立我們的自我查詢檢索器​

測試它​

篩選 k​

清除​

此頁面是否對您有幫助？

建立 Astra DB 向量儲存庫

建立我們的自我查詢檢索器

測試它

篩選 k

清除