如何為聊天機器人新增記憶體
聊天機器人的一個關鍵功能是它們能夠使用先前對話輪次的內容作為上下文。這種狀態管理可以採取多種形式,包括
- 簡單地將先前的訊息塞入聊天模型提示中。
- 上述方法,但修剪舊訊息以減少模型必須處理的分心資訊量。
- 更複雜的修改,例如為長時間運行的對話合成摘要。
我們將在下面更詳細地介紹一些技術!
本操作指南先前使用 RunnableWithMessageHistory 建構了一個聊天機器人。您可以在 v0.2 文件中存取本指南的這個版本。
從 LangChain v0.3 版本開始,我們建議 LangChain 使用者利用 LangGraph 持久性,將 記憶體
整合到新的 LangChain 應用程式中。
如果您的程式碼已經依賴 RunnableWithMessageHistory
或 BaseChatMessageHistory
,則您不需要進行任何更改。我們不打算在不久的將來棄用此功能,因為它適用於簡單的聊天應用程式,並且任何使用 RunnableWithMessageHistory
的程式碼都將繼續按預期工作。
請參閱 如何遷移到 LangGraph 記憶體 以獲取更多詳細資訊。
設定
您需要安裝幾個套件,並將您的 OpenAI API 金鑰設定為名為 OPENAI_API_KEY
的環境變數
%pip install --upgrade --quiet langchain langchain-openai langgraph
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········
我們也來設定一個聊天模型,我們將在下面的範例中使用它。
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o-mini")
訊息傳遞
最簡單的記憶體形式是簡單地將聊天歷史訊息傳遞到鏈中。這是一個範例
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(
content="You are a helpful assistant. Answer all questions to the best of your ability."
),
MessagesPlaceholder(variable_name="messages"),
]
)
chain = prompt | model
ai_msg = chain.invoke(
{
"messages": [
HumanMessage(
content="Translate from English to French: I love programming."
),
AIMessage(content="J'adore la programmation."),
HumanMessage(content="What did you just say?"),
],
}
)
print(ai_msg.content)
I said, "I love programming" in French: "J'adore la programmation."
我們可以看見,透過將先前的對話傳遞到鏈中,它可以將其用作上下文來回答問題。這是聊天機器人記憶體的基本概念 - 本指南的其餘部分將示範傳遞或重新格式化訊息的便捷技術。
自動歷史記錄管理
先前的範例明確地將訊息傳遞到鏈(和模型)。這是一個完全可以接受的方法,但它確實需要對新訊息進行外部管理。LangChain 也提供了一種使用 LangGraph 的 持久性來建構具有記憶體的應用程式的方法。您可以透過在編譯圖形時提供 checkpointer
來在 LangGraph 應用程式中 啟用持久性。
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
workflow = StateGraph(state_schema=MessagesState)
# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + state["messages"]
response = model.invoke(messages)
return {"messages": response}
# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")
# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
我們將在此處將最新的輸入傳遞給對話,並讓 LangGraph 使用檢查點程式追蹤對話歷史記錄
app.invoke(
{"messages": [HumanMessage(content="Translate to French: I love programming.")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39})]}
app.invoke(
{"messages": [HumanMessage(content="What did I just ask you?")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39}),
HumanMessage(content='What did I just ask you?', additional_kwargs={}, response_metadata={}, id='c667529b-7c41-4cc0-9326-0af47328b816'),
AIMessage(content='You asked me to translate "I love programming" into French.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 54, 'total_tokens': 67, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-134a7ea0-d3a4-4923-bd58-25e5a43f6a1f-0', usage_metadata={'input_tokens': 54, 'output_tokens': 13, 'total_tokens': 67})]}
修改聊天歷史記錄
修改儲存的聊天訊息可以幫助您的聊天機器人處理各種情況。以下是一些範例
修剪訊息
LLM 和聊天模型的上下文視窗有限,即使您沒有直接達到限制,您也可能希望限制模型必須處理的分心程度。一種解決方案是在將歷史訊息傳遞給模型之前修剪它們。讓我們使用上面宣告的 app
的範例歷史記錄
demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]
app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What's my name?")]
},
config={"configurable": {"thread_id": "2"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}, id='c933eca3-5fd8-4651-af16-20fe2d49c216'),
AIMessage(content='Your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 63, 'total_tokens': 68, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-a0b21acc-9dbb-4fb6-a953-392020f37d88-0', usage_metadata={'input_tokens': 63, 'output_tokens': 5, 'total_tokens': 68})]}
我們可以看見應用程式記住了預先載入的名稱。
但假設我們的上下文視窗非常小,並且我們希望將傳遞給模型的訊息數量修剪為僅最近的 2 個訊息。我們可以使用內建的 trim_messages 工具,根據訊息的 token 數在它們到達我們的提示之前修剪訊息。在這種情況下,我們將每個訊息計為 1 個「token」,並且僅保留最近的兩個訊息
from langchain_core.messages import trim_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
# Define trimmer
# count each message as 1 "token" (token_counter=len) and keep only the last two messages
trimmer = trim_messages(strategy="last", max_tokens=2, token_counter=len)
workflow = StateGraph(state_schema=MessagesState)
# Define the function that calls the model
def call_model(state: MessagesState):
trimmed_messages = trimmer.invoke(state["messages"])
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + trimmed_messages
response = model.invoke(messages)
return {"messages": response}
# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")
# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
讓我們調用這個新的應用程式並檢查回應
app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What is my name?")]
},
config={"configurable": {"thread_id": "3"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content='What is my name?', additional_kwargs={}, response_metadata={}, id='a22ab7c5-8617-4821-b3e9-a9e7dca1ff78'),
AIMessage(content="I'm sorry, but I don't have access to personal information about you unless you share it with me. How can I assist you today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 39, 'total_tokens': 66, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-f7b32d72-9f57-4705-be7e-43bf1c3d293b-0', usage_metadata={'input_tokens': 39, 'output_tokens': 27, 'total_tokens': 66})]}
我們可以看見 trim_messages
已被調用,並且只有最近的兩個訊息將被傳遞到模型。在這種情況下,這意味著模型忘記了我們給它的名稱。
查看我們的 關於修剪訊息的操作指南 以獲取更多資訊。
摘要記憶體
我們也可以以其他方式使用相同的模式。例如,我們可以使用額外的 LLM 調用來產生對話摘要,然後再調用我們的應用程式。讓我們重新建立我們的聊天歷史記錄
demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]
現在,讓我們更新模型調用函數,將先前的互動提煉成摘要
from langchain_core.messages import HumanMessage, RemoveMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
workflow = StateGraph(state_schema=MessagesState)
# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability. "
"The provided chat history includes a summary of the earlier conversation."
)
system_message = SystemMessage(content=system_prompt)
message_history = state["messages"][:-1] # exclude the most recent user input
# Summarize the messages if the chat history reaches a certain size
if len(message_history) >= 4:
last_human_message = state["messages"][-1]
# Invoke the model to generate conversation summary
summary_prompt = (
"Distill the above chat messages into a single summary message. "
"Include as many specific details as you can."
)
summary_message = model.invoke(
message_history + [HumanMessage(content=summary_prompt)]
)
# Delete messages that we no longer want to show up
delete_messages = [RemoveMessage(id=m.id) for m in state["messages"]]
# Re-add user message
human_message = HumanMessage(content=last_human_message.content)
# Call the model with summary & response
response = model.invoke([system_message, summary_message, human_message])
message_updates = [summary_message, human_message, response] + delete_messages
else:
message_updates = model.invoke([system_message] + state["messages"])
return {"messages": message_updates}
# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")
# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
讓我們看看它是否記住了我們給它的名稱
app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage("What did I say my name was?")]
},
config={"configurable": {"thread_id": "4"}},
)
{'messages': [AIMessage(content="Nemo greeted me, and I responded positively, indicating that I'm doing well.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 60, 'total_tokens': 76, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-ee42f98d-907d-4bad-8f16-af2db789701d-0', usage_metadata={'input_tokens': 60, 'output_tokens': 16, 'total_tokens': 76}),
HumanMessage(content='What did I say my name was?', additional_kwargs={}, response_metadata={}, id='788555ea-5b1f-4c29-a2f2-a92f15d147be'),
AIMessage(content='You mentioned that your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 67, 'total_tokens': 75, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-099a43bd-a284-4969-bb6f-0be486614cd8-0', usage_metadata={'input_tokens': 67, 'output_tokens': 8, 'total_tokens': 75})]}
請注意,再次調用應用程式將繼續累積歷史記錄,直到達到指定的訊息數量(在我們的案例中為四個)。那時,我們將從初始摘要加上新訊息等產生另一個摘要。