跳到主要內容
Open In ColabOpen on GitHub

從 RefineDocumentsChain 遷移

RefineDocumentsChain 實作了一種分析長文本的策略。該策略如下:

  • 將文本分割成較小的文件;
  • 將一個流程應用於第一個文件;
  • 根據下一個文件精煉或更新結果;
  • 重複執行文件序列,直到完成。

在此上下文中應用的常見流程是摘要,其中隨著我們處理長文本的區塊,會修改正在進行的摘要。這對於與給定 LLM 的上下文窗口相比很大的文本特別有用。

LangGraph 實作為此問題帶來了許多優勢:

  • RefineDocumentsChain 透過類別內的 for 迴圈精煉摘要,而 LangGraph 實作可讓您逐步執行執行,以在需要時監控或以其他方式引導它。
  • LangGraph 實作支援串流執行步驟和個別 Token。
  • 由於它是由模組化組件組裝而成,因此也很容易擴展或修改(例如,納入工具呼叫或其他行為)。

下面我們將透過一個簡單的範例來說明 RefineDocumentsChain 和對應的 LangGraph 實作。

讓我們先載入一個聊天模型

pip install -qU "langchain[openai]"
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

範例

讓我們來看一個範例,我們在其中摘要一系列文件。我們先產生一些簡單的文件以進行說明

from langchain_core.documents import Document

documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yelow", metadata={"title": "banana_book"}),
]
API 參考:Document

舊版

詳細資訊

下面我們展示了使用 RefineDocumentsChain 的實作。我們為初始摘要和後續精煉定義了提示範本,為這兩個目的實例化了單獨的 LLMChain 物件,並使用這些組件實例化 RefineDocumentsChain

from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI

# This controls how each document will be formatted. Specifically,
# it will be passed to `format_document` - see that function for more
# details.
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
# The prompt here should take as an input variable the
# `document_variable_name`
summarize_prompt = ChatPromptTemplate(
[
("human", "Write a concise summary of the following: {context}"),
]
)
initial_llm_chain = LLMChain(llm=llm, prompt=summarize_prompt)
initial_response_name = "existing_answer"
# The prompt here should take as an input variable the
# `document_variable_name` as well as `initial_response_name`
refine_template = """
Produce a final summary.

Existing summary up to this point:
{existing_answer}

New context:
------------
{context}
------------

Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_llm_chain = LLMChain(llm=llm, prompt=refine_prompt)
chain = RefineDocumentsChain(
initial_llm_chain=initial_llm_chain,
refine_llm_chain=refine_llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
initial_response_name=initial_response_name,
)

我們現在可以調用我們的鏈

result = chain.invoke(documents)
result["output_text"]
'Apples are typically red in color, blueberries are blue, and bananas are yellow.'

LangSmith 追蹤由三個 LLM 呼叫組成:一個用於初始摘要,另外兩個用於更新該摘要。當我們使用最後一個文件的內容更新摘要時,流程完成。

LangGraph

詳細資訊

下面我們展示此流程的 LangGraph 實作

  • 我們使用與之前相同的兩個範本。
  • 我們為初始摘要產生一個簡單的鏈,該鏈提取第一個文件,將其格式化為提示,並使用我們的 LLM 執行推論。
  • 我們產生第二個 refine_summary_chain,它對每個後續文件進行操作,精煉初始摘要。

我們需要安裝 langgraph

pip install -qU langgraph
import operator
from typing import List, Literal, TypedDict

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Initial summary
summarize_prompt = ChatPromptTemplate(
[
("human", "Write a concise summary of the following: {context}"),
]
)
initial_summary_chain = summarize_prompt | llm | StrOutputParser()

# Refining the summary with new docs
refine_template = """
Produce a final summary.

Existing summary up to this point:
{existing_answer}

New context:
------------
{context}
------------

Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])

refine_summary_chain = refine_prompt | llm | StrOutputParser()


# For LangGraph, we will define the state of the graph to hold the query,
# destination, and final answer.
class State(TypedDict):
contents: List[str]
index: int
summary: str


# We define functions for each node, including a node that generates
# the initial summary:
async def generate_initial_summary(state: State, config: RunnableConfig):
summary = await initial_summary_chain.ainvoke(
state["contents"][0],
config,
)
return {"summary": summary, "index": 1}


# And a node that refines the summary based on the next document
async def refine_summary(state: State, config: RunnableConfig):
content = state["contents"][state["index"]]
summary = await refine_summary_chain.ainvoke(
{"existing_answer": state["summary"], "context": content},
config,
)

return {"summary": summary, "index": state["index"] + 1}


# Here we implement logic to either exit the application or refine
# the summary.
def should_refine(state: State) -> Literal["refine_summary", END]:
if state["index"] >= len(state["contents"]):
return END
else:
return "refine_summary"


graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)

graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()
from IPython.display import Image

Image(app.get_graph().draw_mermaid_png())

我們可以按如下方式逐步執行執行,列印出精煉後的摘要

async for step in app.astream(
{"contents": [doc.page_content for doc in documents]},
stream_mode="values",
):
if summary := step.get("summary"):
print(summary)
Apples are typically red in color.
Apples are typically red in color, while blueberries are blue.
Apples are typically red in color, blueberries are blue, and bananas are yellow.

LangSmith 追蹤中,我們再次恢復了三個 LLM 呼叫,執行與之前相同的功能。

請注意,我們可以從應用程式串流 Token,包括從中間步驟串流

async for event in app.astream_events(
{"contents": [doc.page_content for doc in documents]}, version="v2"
):
kind = event["event"]
if kind == "on_chat_model_stream":
content = event["data"]["chunk"].content
if content:
print(content, end="|")
elif kind == "on_chat_model_end":
print("\n\n")
Ap|ples| are| characterized| by| their| red| color|.|


Ap|ples| are| characterized| by| their| red| color|,| while| blueberries| are| known| for| their| blue| hue|.|


Ap|ples| are| characterized| by| their| red| color|,| blueberries| are| known| for| their| blue| hue|,| and| bananas| are| recognized| for| their| yellow| color|.|

下一步

請參閱本教學以了解更多基於 LLM 的摘要策略。

查看 LangGraph 文件以了解有關使用 LangGraph 進行建置的詳細資訊。


此頁面是否對您有幫助?