建立檢索增強生成 (RAG) 應用程式：第 2 部分

在許多問答應用程式中，我們希望允許使用者進行來回對話，這表示應用程式需要某種形式的過去問題和答案「記憶」，以及一些邏輯來將這些記憶納入目前的思考中。

這是多部分教學課程的第二部分

第 1 部分介紹 RAG，並逐步說明最簡化的實作。
第 2 部分（本指南）擴展了實作，以適應對話式互動和多步驟檢索流程。

在這裡，我們專注於新增邏輯以納入歷史訊息。 這涉及聊天歷史記錄的管理。

我們將涵蓋兩種方法

鏈，其中我們最多執行一個檢索步驟；
代理程式，其中我們讓 LLM 自行決定執行多個檢索步驟。

注意

此處介紹的方法利用了現代聊天模型中的工具呼叫功能。請參閱此頁面，以取得支援工具呼叫功能的模型表格。

對於外部知識來源，我們將使用與 RAG 教學課程第 1 部分相同的 Lilian Weng 的 LLM Powered Autonomous Agents 部落格文章。

設定

組件

我們需要從 LangChain 的整合套件中選取三個組件。

選取聊天模型

pip install -qU "langchain[openai]"

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

選取嵌入模型

pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

選取向量儲存區

pip install -qU langchain-core

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

相依性

此外，我們將使用以下套件

%%capture --no-stderr
%pip install --upgrade --quiet langgraph langchain-community beautifulsoup4

LangSmith

您使用 LangChain 建立的許多應用程式將包含多個步驟，其中包含多次 LLM 呼叫。隨著這些應用程式變得越來越複雜，能夠檢查鏈或代理程式內部究竟發生什麼情況變得至關重要。最好的方法是使用 LangSmith。

請注意，LangSmith 不是必需的，但它很有幫助。如果您確實想使用 LangSmith，在您在上面的連結註冊後，請務必設定您的環境變數以開始記錄追蹤

os.environ["LANGSMITH_TRACING"] = "true"
if not os.environ.get("LANGSMITH_API_KEY"):
    os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

鏈

讓我們先回顧在第 1 部分中建立的向量儲存區，該儲存區索引了 Lilian Weng 的 LLM Powered Autonomous Agents 部落格文章。

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

API 參考：hub | WebBaseLoader | Document | RecursiveCharacterTextSplitter

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

在 RAG 教學課程的第 1 部分中，我們將使用者輸入、檢索到的上下文和產生的答案表示為狀態中的個別鍵。對話式體驗可以使用一系列訊息自然地表示。除了來自使用者和助理的訊息外，檢索到的文件和其他工件可以透過工具訊息納入訊息序列中。這促使我們使用訊息序列來表示 RAG 應用程式的狀態。具體來說，我們將有

使用者輸入作為 HumanMessage；
向量儲存區查詢作為具有工具呼叫的 AIMessage；
檢索到的文件作為 ToolMessage；
最終回應作為 AIMessage。

此狀態模型非常通用，LangGraph 提供內建版本以方便使用

from langgraph.graph import MessagesState, StateGraph

graph_builder = StateGraph(MessagesState)

API 參考：StateGraph

利用工具呼叫與檢索步驟互動還有另一個好處，那就是檢索的查詢是由我們的模型產生的。這在對話環境中尤其重要，在對話環境中，使用者查詢可能需要根據聊天歷史記錄進行情境化。例如，考慮以下交流

使用者：「什麼是任務分解？」

AI：「任務分解涉及將複雜任務分解為更小更簡單的步驟，使其更易於代理程式或模型管理。」

使用者：「常見的方法有哪些？」

在這種情況下，模型可以產生諸如 "任務分解的常見方法" 之類的查詢。工具呼叫自然地促進了這一點。與 RAG 教學課程的查詢分析章節一樣，這允許模型將使用者查詢重寫為更有效的搜尋查詢。它還支援不涉及檢索步驟的直接回應（例如，回應來自使用者的通用問候語）。

讓我們將檢索步驟變成工具

from langchain_core.tools import tool


@tool(response_format="content_and_artifact")
def retrieve(query: str):
    """Retrieve information related to a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

API 參考：tool

請參閱本指南，以取得有關建立工具的更多詳細資訊。

我們的圖形將包含三個節點

一個節點，用於接收使用者輸入，產生檢索器的查詢或直接回應；
一個用於檢索器工具的節點，用於執行檢索步驟；
一個節點，用於使用檢索到的上下文產生最終回應。

我們在下面建立它們。請注意，我們利用另一個預先建立的 LangGraph 組件 ToolNode，它執行工具並將結果作為 ToolMessage 新增到狀態。

from langchain_core.messages import SystemMessage
from langgraph.prebuilt import ToolNode


# Step 1: Generate an AIMessage that may include a tool-call to be sent.
def query_or_respond(state: MessagesState):
    """Generate tool call for retrieval or respond."""
    llm_with_tools = llm.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    # MessagesState appends messages to state instead of overwriting
    return {"messages": [response]}


# Step 2: Execute the retrieval.
tools = ToolNode([retrieve])


# Step 3: Generate a response using the retrieved content.
def generate(state: MessagesState):
    """Generate answer."""
    # Get generated ToolMessages
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break
    tool_messages = recent_tool_messages[::-1]

    # Format into prompt
    docs_content = "\n\n".join(doc.content for doc in tool_messages)
    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )
    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]
    prompt = [SystemMessage(system_message_content)] + conversation_messages

    # Run
    response = llm.invoke(prompt)
    return {"messages": [response]}

API 參考：SystemMessage | ToolNode

最後，我們將我們的應用程式編譯成單一 graph 物件。在這種情況下，我們只是將步驟連接成一個序列。我們也允許第一個 query_or_respond 步驟「短路」，並在它未產生工具呼叫時直接回應使用者。這允許我們的應用程式支援對話式體驗——例如，回應可能不需要檢索步驟的通用問候語

from langgraph.graph import END
from langgraph.prebuilt import ToolNode, tools_condition

graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition,
    {END: END, "tools": "tools"},
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()

API 參考：ToolNode | tools_condition

from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))

讓我們測試我們的應用程式。

請注意，它可以適當地回應不需要額外檢索步驟的訊息

input_message = "Hello"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

Hello
==================================[1m Ai Message [0m==================================

Hello! How can I assist you today?

當執行搜尋時，我們可以串流步驟以觀察查詢產生、檢索和答案產生

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

What is Task Decomposition?
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_dLjB3rkMoxZZxwUGXi33UBeh)
 Call ID: call_dLjB3rkMoxZZxwUGXi33UBeh
  Args:
    query: Task Decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
==================================[1m Ai Message [0m==================================

Task Decomposition is the process of breaking down a complicated task into smaller, manageable steps. It often involves techniques like Chain of Thought (CoT), which encourages models to think step by step, enhancing performance on complex tasks. This approach allows for a clearer understanding of the task and aids in structuring the problem-solving process.

查看 LangSmith 追蹤此處。

聊天歷史記錄的狀態管理

注意

本教學課程的此章節先前使用 RunnableWithMessageHistory 抽象概念。您可以在 v0.2 文件中存取該文件版本。

從 LangChain 的 v0.3 版本開始，我們建議 LangChain 使用者利用 LangGraph 持久性將 記憶體 納入新的 LangChain 應用程式中。

如果您的程式碼已經依賴 RunnableWithMessageHistory 或 BaseChatMessageHistory，您不需要進行任何變更。我們不打算在不久的將來棄用此功能，因為它適用於簡單的聊天應用程式，並且任何使用 RunnableWithMessageHistory 的程式碼都將繼續按預期運作。

請參閱如何遷移到 LangGraph 記憶體以取得更多詳細資訊。

在生產環境中，問答應用程式通常會將聊天歷史記錄持久儲存到資料庫中，並且能夠適當地讀取和更新它。

LangGraph 實作了內建的持久性層，使其成為支援多個對話輪次之聊天應用程式的理想選擇。

若要管理多個對話輪次和線程，我們只需在編譯應用程式時指定 checkpointer 即可。由於圖形中的節點正在將訊息附加到狀態，因此我們將在跨調用中保留一致的聊天歷史記錄。

LangGraph 隨附一個簡單的記憶體內 checkpointer，我們在下面使用它。請參閱其文件，以取得更多詳細資訊，包括如何使用不同的持久性後端（例如，SQLite 或 Postgres）。

如需如何管理訊息歷史記錄的詳細逐步說明，請前往如何新增訊息歷史記錄（記憶體）指南。

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

# Specify an ID for the thread
config = {"configurable": {"thread_id": "abc123"}}

API 參考：MemorySaver

我們現在可以像以前一樣調用

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

What is Task Decomposition?
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_JZb6GLD812bW2mQsJ5EJQDnN)
 Call ID: call_JZb6GLD812bW2mQsJ5EJQDnN
  Args:
    query: Task Decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
==================================[1m Ai Message [0m==================================

Task Decomposition is a technique used to break down complicated tasks into smaller, manageable steps. It involves using methods like Chain of Thought (CoT) prompting, which encourages the model to think step by step, enhancing performance on complex tasks. This process helps to clarify the model's reasoning and makes it easier to tackle difficult problems.

input_message = "Can you look up some common ways of doing it?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

Can you look up some common ways of doing it?
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_kjRI4Y5cJOiB73yvd7dmb6ux)
 Call ID: call_kjRI4Y5cJOiB73yvd7dmb6ux
  Args:
    query: common methods of task decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
==================================[1m Ai Message [0m==================================

Common ways of performing Task Decomposition include: (1) using Large Language Models (LLMs) with simple prompts like "Steps for XYZ" or "What are the subgoals for achieving XYZ?", (2) employing task-specific instructions such as "Write a story outline" for specific tasks, and (3) incorporating human inputs to guide the decomposition process.

請注意，模型在第二個問題中產生的查詢納入了對話情境。

LangSmith 追蹤在此特別具有參考價值，因為我們可以確切地看到在每個步驟中哪些訊息對我們的聊天模型可見。

代理程式

代理程式利用 LLM 的推理能力在執行期間做出決策。使用代理程式可讓您卸載對檢索流程的額外判斷權。雖然它們的行為不如上面的「鏈」可預測，但它們能夠執行多個檢索步驟以服務查詢，或迭代單一搜尋。

下面我們組裝了一個最小的 RAG 代理程式。使用 LangGraph 的預先建立的 ReAct 代理程式建構子，我們可以在一行程式碼中完成此操作。

提示

查看 LangGraph 的 Agentic RAG 教學課程，以取得更進階的公式。

from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory)

API 參考：create_react_agent

讓我們檢查圖形

display(Image(agent_executor.get_graph().draw_mermaid_png()))

與我們先前的實作的主要區別在於，這裡的工具調用循環回到原始 LLM 呼叫，而不是結束運行的最終產生步驟。然後，模型可以使用檢索到的上下文回答問題，或產生另一個工具呼叫以取得更多資訊。

讓我們測試一下。我們建構一個通常需要迭代檢索步驟序列才能回答的問題

config = {"configurable": {"thread_id": "def234"}}

input_message = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent_executor.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    event["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

What is the standard method for Task Decomposition?

Once you get the answer, look up common extensions of that method.
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_Y3YaIzL71B83Cjqa8d2G0O8N)
 Call ID: call_Y3YaIzL71B83Cjqa8d2G0O8N
  Args:
    query: standard method for Task Decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_2JntP1x4XQMWwgVpYurE12ff)
 Call ID: call_2JntP1x4XQMWwgVpYurE12ff
  Args:
    query: common extensions of Task Decomposition methods
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
==================================[1m Ai Message [0m==================================

The standard method for task decomposition involves using techniques such as Chain of Thought (CoT), where a model is instructed to "think step by step" to break down complex tasks into smaller, more manageable components. This approach enhances model performance by allowing for more thorough reasoning and planning. Task decomposition can be accomplished through various means, including:

1. Simple prompting (e.g., asking for steps to achieve a goal).
2. Task-specific instructions (e.g., asking for a story outline).
3. Human inputs to guide the decomposition process.

### Common Extensions of Task Decomposition Methods:

1. **Tree of Thoughts**: This extension builds on CoT by not only decomposing the problem into thought steps but also generating multiple thoughts at each step, creating a tree structure. The search process can employ breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier or through majority voting.

These extensions aim to enhance reasoning capabilities and improve the effectiveness of task decomposition in various contexts.

請注意，代理程式

產生查詢以搜尋任務分解的標準方法；
接收到答案後，產生第二個查詢以搜尋其常見擴展；
在收到所有必要的上下文後，回答問題。

我們可以在 LangSmith 追蹤中看到完整的步驟序列，以及延遲和其他中繼資料。

後續步驟

我們已涵蓋建立基本對話式問答應用程式的步驟

我們使用鏈來建立可預測的應用程式，該應用程式針對每個使用者輸入最多產生一個查詢；
我們使用代理程式來建立可以迭代查詢序列的應用程式。

若要探索不同類型的檢索器和檢索策略，請造訪操作指南的檢索器章節。

如需 LangChain 對話記憶體抽象概念的詳細逐步說明，請造訪如何新增訊息歷史記錄（記憶體）指南。

若要瞭解有關代理程式的更多資訊，請查看概念指南和 LangGraph 代理程式架構頁面。

設定​

組件​

相依性​

LangSmith​

鏈​

聊天歷史記錄的狀態管理​

代理程式​

後續步驟​

此頁面是否有幫助？

設定

組件

相依性

LangSmith

鏈

聊天歷史記錄的狀態管理

代理程式

後續步驟