如何在查詢分析時處理多個查詢
有時,查詢分析技術可能允許產生多個查詢。在這些情況下,我們需要記住運行所有查詢,然後合併結果。我們將展示一個如何執行此操作的簡單範例(使用模擬資料)。
設定
安裝依賴項
%pip install -qU langchain langchain-community langchain-openai langchain-chroma
Note: you may need to restart the kernel to use updated packages.
設定環境變數
在此範例中,我們將使用 OpenAI
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass()
# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()
建立索引
我們將在虛假資訊上建立向量資料庫。
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
texts = ["Harrison worked at Kensho", "Ankush worked at Facebook"]
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(
texts,
embeddings,
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})
API 參考文檔:OpenAIEmbeddings | RecursiveCharacterTextSplitter
查詢分析
我們將使用函數呼叫來結構化輸出。我們將使其返回多個查詢。
from typing import List, Optional
from pydantic import BaseModel, Field
class Search(BaseModel):
"""Search over a database of job records."""
queries: List[str] = Field(
...,
description="Distinct queries to search for",
)
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
output_parser = PydanticToolsParser(tools=[Search])
system = """You have the ability to issue search queries to get information to help answer user information.
If you need to look up two distinct pieces of information, you are allowed to do that!"""
prompt = ChatPromptTemplate.from_messages(
[
("system", system),
("human", "{question}"),
]
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm
我們可以看到,這允許建立多個查詢
query_analyzer.invoke("where did Harrison Work")
Search(queries=['Harrison Work', 'Harrison employment history'])
query_analyzer.invoke("where did Harrison and ankush Work")
Search(queries=['Harrison work history', 'Ankush work history'])
使用查詢分析進行檢索
那麼我們該如何將其包含在鏈中?如果我們非同步調用檢索器,這將使事情變得容易得多 - 這將使我們可以循環處理查詢,而不會被回應時間阻塞。
from langchain_core.runnables import chain
API 參考文檔:chain
@chain
async def custom_chain(question):
response = await query_analyzer.ainvoke(question)
docs = []
for query in response.queries:
new_docs = await retriever.ainvoke(query)
docs.extend(new_docs)
# You probably want to think about reranking or deduplicating documents here
# But that is a separate topic
return docs
await custom_chain.ainvoke("where did Harrison Work")
[Document(page_content='Harrison worked at Kensho'),
Document(page_content='Harrison worked at Kensho')]
await custom_chain.ainvoke("where did Harrison and ankush Work")
[Document(page_content='Harrison worked at Kensho'),
Document(page_content='Ankush worked at Facebook')]