跳到主要內容
Open In ColabOpen on GitHub

從 MapRerankDocumentsChain 遷移

MapRerankDocumentsChain 實作了一種分析長文本的策略。該策略如下:

  • 將文本分割成較小的文件;
  • 將一個程序映射到文件集,其中該程序包括產生一個分數;
  • 按分數對結果進行排名並返回最大值。

在這種情況下,常見的程序是使用來自文件的上下文片段進行問答。強迫模型產生分數以及其答案有助於僅選擇由相關上下文產生的答案。

LangGraph 實作允許將工具呼叫和其他功能納入此問題。下面我們將透過一個簡單的範例來說明 MapRerankDocumentsChain 和相應的 LangGraph 實作。

範例

讓我們來看一個分析一組文件的範例。讓我們使用以下 3 個文件

from langchain_core.documents import Document

documents = [
Document(page_content="Alice has blue eyes", metadata={"title": "book_chapter_2"}),
Document(page_content="Bob has brown eyes", metadata={"title": "book_chapter_1"}),
Document(
page_content="Charlie has green eyes", metadata={"title": "book_chapter_3"}
),
]
API 參考:Document

舊版

詳細資訊

下面我們展示了使用 MapRerankDocumentsChain 的實作。我們定義了問答任務的提示範本,並為此目的實例化了一個 LLMChain 物件。我們定義了如何將文件格式化為提示,並確保各種提示中金鑰的一致性。

from langchain.chains import LLMChain, MapRerankDocumentsChain
from langchain.output_parsers.regex import RegexParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

document_variable_name = "context"
llm = OpenAI()
# The prompt here should take as an input variable the
# `document_variable_name`
# The actual prompt will need to be a lot more complex, this is just
# an example.
prompt_template = (
"What color are Bob's eyes? "
"Output both your answer and a score (1-10) of how confident "
"you are in the format: <Answer>\nScore: <Score>.\n\n"
"Provide no other commentary.\n\n"
"Context: {context}"
)
output_parser = RegexParser(
regex=r"(.*?)\nScore: (.*)",
output_keys=["answer", "score"],
)
prompt = PromptTemplate(
template=prompt_template,
input_variables=["context"],
output_parser=output_parser,
)
llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = MapRerankDocumentsChain(
llm_chain=llm_chain,
document_variable_name=document_variable_name,
rank_key="score",
answer_key="answer",
)
response = chain.invoke(documents)
response["output_text"]
/langchain/libs/langchain/langchain/chains/llm.py:369: UserWarning: The apply_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
warnings.warn(
'Brown'

檢視上述執行的 LangSmith 追蹤,我們可以看見三次 LLM 呼叫(每個文件一次),並且評分機制減輕了幻覺問題。

LangGraph

詳細資訊

下面我們展示了此程序的 LangGraph 實作。請注意,我們的範本已簡化,因為我們透過 .with_structured_output 方法將格式化指令委派給聊天模型的工具呼叫功能。

在這裡,我們遵循基本的 map-reduce 工作流程來平行執行 LLM 呼叫。

我們需要安裝 langgraph

pip install -qU langgraph
import operator
from typing import Annotated, List, TypedDict

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph


class AnswerWithScore(TypedDict):
answer: str
score: Annotated[int, ..., "Score from 1-10."]


llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt_template = "What color are Bob's eyes?\n\n" "Context: {context}"
prompt = ChatPromptTemplate.from_template(prompt_template)

# The below chain formats context from a document into a prompt, then
# generates a response structured according to the AnswerWithScore schema.
map_chain = prompt | llm.with_structured_output(AnswerWithScore)

# Below we define the components that will make up the graph


# This will be the overall state of the graph.
# It will contain the input document contents, corresponding
# answers with scores, and a final answer.
class State(TypedDict):
contents: List[str]
answers_with_scores: Annotated[list, operator.add]
answer: str


# This will be the state of the node that we will "map" all
# documents to in order to generate answers with scores
class MapState(TypedDict):
content: str


# Here we define the logic to map out over the documents
# We will use this an edge in the graph
def map_analyses(state: State):
# We will return a list of `Send` objects
# Each `Send` object consists of the name of a node in the graph
# as well as the state to send to that node
return [
Send("generate_analysis", {"content": content}) for content in state["contents"]
]


# Here we generate an answer with score, given a document
async def generate_analysis(state: MapState):
response = await map_chain.ainvoke(state["content"])
return {"answers_with_scores": [response]}


# Here we will select the top answer
def pick_top_ranked(state: State):
ranked_answers = sorted(
state["answers_with_scores"], key=lambda x: -int(x["score"])
)
return {"answer": ranked_answers[0]}


# Construct the graph: here we put everything together to construct our graph
graph = StateGraph(State)
graph.add_node("generate_analysis", generate_analysis)
graph.add_node("pick_top_ranked", pick_top_ranked)
graph.add_conditional_edges(START, map_analyses, ["generate_analysis"])
graph.add_edge("generate_analysis", "pick_top_ranked")
graph.add_edge("pick_top_ranked", END)
app = graph.compile()
from IPython.display import Image

Image(app.get_graph().draw_mermaid_png())

result = await app.ainvoke({"contents": [doc.page_content for doc in documents]})
result["answer"]
{'answer': 'Bob has brown eyes.', 'score': 10}

檢視上述執行的 LangSmith 追蹤,我們可以像以前一樣看到三次 LLM 呼叫。使用模型的工具呼叫功能也使我們能夠移除解析步驟。

後續步驟

請參閱這些操作指南,以了解有關 RAG 問答任務的更多資訊。

查看 LangGraph 文件,以取得有關使用 LangGraph 建構的詳細資訊,包括關於 LangGraph 中 map-reduce 詳細資訊的本指南


此頁面是否對您有幫助?