Tencent Cloud VectorDB
騰訊雲向量資料庫 是一款全託管、自主研發的企業級分散式資料庫服務,專為儲存、檢索和分析多維向量資料而設計。
在本逐步解說中,我們將示範如何使用 SelfQueryRetriever
與騰訊雲向量資料庫。
建立 TencentVectorDB 執行個體
首先,我們需要建立一個 TencentVectorDB 並以一些資料初始化它。我們建立了一個小的示範文件集,其中包含電影摘要。
**注意:** 自查詢檢索器需要您安裝 lark
(pip install lark
) 以及特定於整合的要求。
%pip install --upgrade --quiet tcvectordb langchain-openai tiktoken lark
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
我們想要使用 OpenAIEmbeddings
,因此我們必須取得 OpenAI API 金鑰。
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
建立 TencentVectorDB 執行個體並以一些資料初始化它
from langchain_community.vectorstores.tencentvectordb import (
ConnectionParams,
MetaField,
TencentVectorDB,
)
from langchain_core.documents import Document
from tcvectordb.model.enum import FieldType
meta_fields = [
MetaField(name="year", data_type="uint64", index=True),
MetaField(name="rating", data_type="string", index=False),
MetaField(name="genre", data_type=FieldType.String, index=True),
MetaField(name="director", data_type=FieldType.String, index=True),
]
docs = [
Document(
page_content="The Shawshank Redemption is a 1994 American drama film written and directed by Frank Darabont.",
metadata={
"year": 1994,
"rating": "9.3",
"genre": "drama",
"director": "Frank Darabont",
},
),
Document(
page_content="The Godfather is a 1972 American crime film directed by Francis Ford Coppola.",
metadata={
"year": 1972,
"rating": "9.2",
"genre": "crime",
"director": "Francis Ford Coppola",
},
),
Document(
page_content="The Dark Knight is a 2008 superhero film directed by Christopher Nolan.",
metadata={
"year": 2008,
"rating": "9.0",
"genre": "science fiction",
"director": "Christopher Nolan",
},
),
Document(
page_content="Inception is a 2010 science fiction action film written and directed by Christopher Nolan.",
metadata={
"year": 2010,
"rating": "8.8",
"genre": "science fiction",
"director": "Christopher Nolan",
},
),
Document(
page_content="The Avengers is a 2012 American superhero film based on the Marvel Comics superhero team of the same name.",
metadata={
"year": 2012,
"rating": "8.0",
"genre": "science fiction",
"director": "Joss Whedon",
},
),
Document(
page_content="Black Panther is a 2018 American superhero film based on the Marvel Comics character of the same name.",
metadata={
"year": 2018,
"rating": "7.3",
"genre": "science fiction",
"director": "Ryan Coogler",
},
),
]
vector_db = TencentVectorDB.from_documents(
docs,
None,
connection_params=ConnectionParams(
url="http://10.0.X.X",
key="eC4bLRy2va******************************",
username="root",
timeout=20,
),
collection_name="self_query_movies",
meta_fields=meta_fields,
drop_old=True,
)
建立我們的自查詢檢索器
現在我們可以實例化我們的檢索器。為此,我們需要預先提供一些關於文件支援的元數據欄位以及文件內容簡短描述的資訊。
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="string"
),
]
document_content_description = "Brief summary of a movie"
llm = ChatOpenAI(temperature=0, model="gpt-4", max_tokens=4069)
retriever = SelfQueryRetriever.from_llm(
llm, vector_db, document_content_description, metadata_field_info, verbose=True
)
測試一下
現在我們可以嘗試實際使用我們的檢索器了!
# This example only specifies a relevant query
retriever.invoke("movies about a superhero")
[Document(page_content='The Dark Knight is a 2008 superhero film directed by Christopher Nolan.', metadata={'year': 2008, 'rating': '9.0', 'genre': 'science fiction', 'director': 'Christopher Nolan'}),
Document(page_content='The Avengers is a 2012 American superhero film based on the Marvel Comics superhero team of the same name.', metadata={'year': 2012, 'rating': '8.0', 'genre': 'science fiction', 'director': 'Joss Whedon'}),
Document(page_content='Black Panther is a 2018 American superhero film based on the Marvel Comics character of the same name.', metadata={'year': 2018, 'rating': '7.3', 'genre': 'science fiction', 'director': 'Ryan Coogler'}),
Document(page_content='The Godfather is a 1972 American crime film directed by Francis Ford Coppola.', metadata={'year': 1972, 'rating': '9.2', 'genre': 'crime', 'director': 'Francis Ford Coppola'})]
# This example only specifies a filter
retriever.invoke("movies that were released after 2010")
[Document(page_content='The Avengers is a 2012 American superhero film based on the Marvel Comics superhero team of the same name.', metadata={'year': 2012, 'rating': '8.0', 'genre': 'science fiction', 'director': 'Joss Whedon'}),
Document(page_content='Black Panther is a 2018 American superhero film based on the Marvel Comics character of the same name.', metadata={'year': 2018, 'rating': '7.3', 'genre': 'science fiction', 'director': 'Ryan Coogler'})]
# This example specifies both a relevant query and a filter
retriever.invoke("movies about a superhero which were released after 2010")
[Document(page_content='The Avengers is a 2012 American superhero film based on the Marvel Comics superhero team of the same name.', metadata={'year': 2012, 'rating': '8.0', 'genre': 'science fiction', 'director': 'Joss Whedon'}),
Document(page_content='Black Panther is a 2018 American superhero film based on the Marvel Comics character of the same name.', metadata={'year': 2018, 'rating': '7.3', 'genre': 'science fiction', 'director': 'Ryan Coogler'})]
過濾 k
我們也可以使用自查詢檢索器來指定 k:要提取的文件數量。
我們可以通過將 enable_limit=True
傳遞給建構函式來做到這一點。
retriever = SelfQueryRetriever.from_llm(
llm,
vector_db,
document_content_description,
metadata_field_info,
verbose=True,
enable_limit=True,
)
retriever.invoke("what are two movies about a superhero")
[Document(page_content='The Dark Knight is a 2008 superhero film directed by Christopher Nolan.', metadata={'year': 2008, 'rating': '9.0', 'genre': 'science fiction', 'director': 'Christopher Nolan'}),
Document(page_content='The Avengers is a 2012 American superhero film based on the Marvel Comics superhero team of the same name.', metadata={'year': 2012, 'rating': '8.0', 'genre': 'science fiction', 'director': 'Joss Whedon'})]