從 RefineDocumentsChain 遷移
RefineDocumentsChain 實作了一種分析長文本的策略。該策略如下:
- 將文本分割成較小的文件;
- 將一個流程應用於第一個文件;
- 根據下一個文件精煉或更新結果;
- 重複執行文件序列,直到完成。
在此上下文中應用的常見流程是摘要,其中隨著我們處理長文本的區塊,會修改正在進行的摘要。這對於與給定 LLM 的上下文窗口相比很大的文本特別有用。
LangGraph 實作為此問題帶來了許多優勢:
RefineDocumentsChain
透過類別內的for
迴圈精煉摘要,而 LangGraph 實作可讓您逐步執行執行,以在需要時監控或以其他方式引導它。- LangGraph 實作支援串流執行步驟和個別 Token。
- 由於它是由模組化組件組裝而成,因此也很容易擴展或修改(例如,納入工具呼叫或其他行為)。
下面我們將透過一個簡單的範例來說明 RefineDocumentsChain
和對應的 LangGraph 實作。
讓我們先載入一個聊天模型
選擇聊天模型
pip install -qU "langchain[openai]"
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")
範例
讓我們來看一個範例,我們在其中摘要一系列文件。我們先產生一些簡單的文件以進行說明
from langchain_core.documents import Document
documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yelow", metadata={"title": "banana_book"}),
]
API 參考:Document
舊版
詳細資訊
下面我們展示了使用 RefineDocumentsChain
的實作。我們為初始摘要和後續精煉定義了提示範本,為這兩個目的實例化了單獨的 LLMChain 物件,並使用這些組件實例化 RefineDocumentsChain
。
from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI
# This controls how each document will be formatted. Specifically,
# it will be passed to `format_document` - see that function for more
# details.
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
# The prompt here should take as an input variable the
# `document_variable_name`
summarize_prompt = ChatPromptTemplate(
[
("human", "Write a concise summary of the following: {context}"),
]
)
initial_llm_chain = LLMChain(llm=llm, prompt=summarize_prompt)
initial_response_name = "existing_answer"
# The prompt here should take as an input variable the
# `document_variable_name` as well as `initial_response_name`
refine_template = """
Produce a final summary.
Existing summary up to this point:
{existing_answer}
New context:
------------
{context}
------------
Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_llm_chain = LLMChain(llm=llm, prompt=refine_prompt)
chain = RefineDocumentsChain(
initial_llm_chain=initial_llm_chain,
refine_llm_chain=refine_llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
initial_response_name=initial_response_name,
)
我們現在可以調用我們的鏈
result = chain.invoke(documents)
result["output_text"]
'Apples are typically red in color, blueberries are blue, and bananas are yellow.'
LangSmith 追蹤由三個 LLM 呼叫組成:一個用於初始摘要,另外兩個用於更新該摘要。當我們使用最後一個文件的內容更新摘要時,流程完成。
LangGraph
詳細資訊
下面我們展示此流程的 LangGraph 實作
- 我們使用與之前相同的兩個範本。
- 我們為初始摘要產生一個簡單的鏈,該鏈提取第一個文件,將其格式化為提示,並使用我們的 LLM 執行推論。
- 我們產生第二個
refine_summary_chain
,它對每個後續文件進行操作,精煉初始摘要。
我們需要安裝 langgraph
pip install -qU langgraph
import operator
from typing import List, Literal, TypedDict
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Initial summary
summarize_prompt = ChatPromptTemplate(
[
("human", "Write a concise summary of the following: {context}"),
]
)
initial_summary_chain = summarize_prompt | llm | StrOutputParser()
# Refining the summary with new docs
refine_template = """
Produce a final summary.
Existing summary up to this point:
{existing_answer}
New context:
------------
{context}
------------
Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_summary_chain = refine_prompt | llm | StrOutputParser()
# For LangGraph, we will define the state of the graph to hold the query,
# destination, and final answer.
class State(TypedDict):
contents: List[str]
index: int
summary: str
# We define functions for each node, including a node that generates
# the initial summary:
async def generate_initial_summary(state: State, config: RunnableConfig):
summary = await initial_summary_chain.ainvoke(
state["contents"][0],
config,
)
return {"summary": summary, "index": 1}
# And a node that refines the summary based on the next document
async def refine_summary(state: State, config: RunnableConfig):
content = state["contents"][state["index"]]
summary = await refine_summary_chain.ainvoke(
{"existing_answer": state["summary"], "context": content},
config,
)
return {"summary": summary, "index": state["index"] + 1}
# Here we implement logic to either exit the application or refine
# the summary.
def should_refine(state: State) -> Literal["refine_summary", END]:
if state["index"] >= len(state["contents"]):
return END
else:
return "refine_summary"
graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)
graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()
from IPython.display import Image
Image(app.get_graph().draw_mermaid_png())
我們可以按如下方式逐步執行執行,列印出精煉後的摘要
async for step in app.astream(
{"contents": [doc.page_content for doc in documents]},
stream_mode="values",
):
if summary := step.get("summary"):
print(summary)
Apples are typically red in color.
Apples are typically red in color, while blueberries are blue.
Apples are typically red in color, blueberries are blue, and bananas are yellow.
在 LangSmith 追蹤中,我們再次恢復了三個 LLM 呼叫,執行與之前相同的功能。
請注意,我們可以從應用程式串流 Token,包括從中間步驟串流
async for event in app.astream_events(
{"contents": [doc.page_content for doc in documents]}, version="v2"
):
kind = event["event"]
if kind == "on_chat_model_stream":
content = event["data"]["chunk"].content
if content:
print(content, end="|")
elif kind == "on_chat_model_end":
print("\n\n")
Ap|ples| are| characterized| by| their| red| color|.|
Ap|ples| are| characterized| by| their| red| color|,| while| blueberries| are| known| for| their| blue| hue|.|
Ap|ples| are| characterized| by| their| red| color|,| blueberries| are| known| for| their| blue| hue|,| and| bananas| are| recognized| for| their| yellow| color|.|
下一步
請參閱本教學以了解更多基於 LLM 的摘要策略。
查看 LangGraph 文件以了解有關使用 LangGraph 進行建置的詳細資訊。