如何修剪訊息
所有模型都有有限的上下文窗口,這表示它們可以作為輸入的 tokens 數量有限制。如果您有非常長的訊息或鏈/代理累積了很長的訊息歷史記錄,您需要管理您傳遞到模型中的訊息長度。
trim_messages 可用於將聊天歷史記錄的大小縮減為指定的 Token 數或指定的訊息數。
如果將修剪後的聊天歷史記錄直接傳遞回聊天模型,則修剪後的聊天歷史記錄應滿足以下屬性
-
產生的聊天歷史記錄應為有效。通常,這表示應滿足以下屬性
- 聊天歷史記錄開頭為 (1)
HumanMessage
或 (2) SystemMessage 後面接著HumanMessage
。 - 聊天歷史記錄結尾為
HumanMessage
或ToolMessage
。 ToolMessage
只能在涉及工具呼叫的AIMessage
之後出現。
這可以透過設定
start_on="human"
和ends_on=("human", "tool")
來實現。 - 聊天歷史記錄開頭為 (1)
-
它包含最近的訊息,並捨棄聊天歷史記錄中的舊訊息。這可以透過設定
strategy="last"
來實現。 -
通常,新的聊天歷史記錄應包含
SystemMessage
(如果它存在於原始聊天歷史記錄中),因為SystemMessage
包含聊天模型的特殊指示。SystemMessage
幾乎總是歷史記錄中的第一條訊息(如果存在)。這可以透過設定include_system=True
來實現。
依 Token 數修剪
在這裡,我們將依 Token 數修剪聊天歷史記錄。修剪後的聊天歷史記錄將產生一個有效的聊天歷史記錄,其中包含 SystemMessage
。
為了保留最新的訊息,我們設定 strategy="last"
。我們也將設定 include_system=True
以包含 SystemMessage
,並設定 start_on="human"
以確保產生的聊天歷史記錄有效。
當基於 Token 數使用 trim_messages
時,這是一個良好的預設配置。請記住為您的使用案例調整 token_counter
和 max_tokens
。
請注意,對於我們的 token_counter
,我們可以傳遞一個函數(稍後會詳細介紹)或一個語言模型(因為語言模型具有訊息 Token 計數方法)。當您修剪訊息以適應該特定模型的上下文窗口時,傳遞模型是有意義的
pip install -qU langchain-openai
from langchain_core.messages import (
AIMessage,
HumanMessage,
SystemMessage,
ToolMessage,
trim_messages,
)
from langchain_core.messages.utils import count_tokens_approximately
messages = [
SystemMessage("you're a good assistant, you always respond with a joke."),
HumanMessage("i wonder why it's called langchain"),
AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
HumanMessage("and who is harrison chasing anyways"),
AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
HumanMessage("what do you call a speechless parrot"),
]
trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# Remember to adjust based on your model
# or else pass a custom token_counter
token_counter=count_tokens_approximately,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# Remember to adjust based on the desired conversation
# length
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
allow_partial=False,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
依訊息數修剪
或者,我們可以基於訊息數修剪聊天歷史記錄,方法是設定 token_counter=len
。在這種情況下,每條訊息都將計為單個 Token,而 max_tokens
將控制最大訊息數。
當基於訊息數使用 trim_messages
時,這是一個良好的預設配置。請記住為您的使用案例調整 max_tokens
。
trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
token_counter=len,
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=5,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),
AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
進階用法
您可以使用 trim_messages
作為構建模組來建立更複雜的處理邏輯。
如果我們想要允許分割訊息的內容,我們可以指定 allow_partial=True
trim_messages(
messages,
max_tokens=56,
strategy="last",
token_counter=count_tokens_approximately,
include_system=True,
allow_partial=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
AIMessage(content="\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
預設情況下,SystemMessage
將不包含在內,因此您可以透過設定 include_system=False
或捨棄 include_system
參數來捨棄它。
trim_messages(
messages,
max_tokens=45,
strategy="last",
token_counter=count_tokens_approximately,
)
[AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
我們可以透過指定 strategy="first"
來執行獲取第一個 max_tokens
的翻轉操作
trim_messages(
messages,
max_tokens=45,
strategy="first",
token_counter=count_tokens_approximately,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]
使用 ChatModel
作為 Token 計數器
您可以傳遞 ChatModel 作為 Token 計數器。這將使用 ChatModel.get_num_tokens_from_messages
。讓我們示範如何將其與 OpenAI 一起使用
from langchain_openai import ChatOpenAI
trim_messages(
messages,
max_tokens=45,
strategy="first",
token_counter=ChatOpenAI(model="gpt-4o"),
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]
編寫自訂 Token 計數器
我們可以編寫一個自訂 Token 計數器函數,該函數接收訊息列表並傳回一個整數。
pip install -qU tiktoken
from typing import List
import tiktoken
from langchain_core.messages import BaseMessage, ToolMessage
def str_token_counter(text: str) -> int:
enc = tiktoken.get_encoding("o200k_base")
return len(enc.encode(text))
def tiktoken_counter(messages: List[BaseMessage]) -> int:
"""Approximately reproduce https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
For simplicity only supports str Message.contents.
"""
num_tokens = 3 # every reply is primed with <|start|>assistant<|message|>
tokens_per_message = 3
tokens_per_name = 1
for msg in messages:
if isinstance(msg, HumanMessage):
role = "user"
elif isinstance(msg, AIMessage):
role = "assistant"
elif isinstance(msg, ToolMessage):
role = "tool"
elif isinstance(msg, SystemMessage):
role = "system"
else:
raise ValueError(f"Unsupported messages type {msg.__class__}")
num_tokens += (
tokens_per_message
+ str_token_counter(role)
+ str_token_counter(msg.content)
)
if msg.name:
num_tokens += tokens_per_name + str_token_counter(msg.name)
return num_tokens
trim_messages(
messages,
token_counter=tiktoken_counter,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
鏈接
trim_messages
可以命令式地(如上所示)或宣告式地使用,使其易於與鏈中的其他組件組合
llm = ChatOpenAI(model="gpt-4o")
# Notice we don't pass in messages. This creates
# a RunnableLambda that takes messages as input
trimmer = trim_messages(
token_counter=llm,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
chain = trimmer | llm
chain.invoke(messages)
AIMessage(content='A "polly-no-wanna-cracker"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 32, 'total_tokens': 43, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_90d33c15d4', 'finish_reason': 'stop', 'logprobs': None}, id='run-b1f8b63b-6bc2-4df4-b3b9-dfc4e3e675fe-0', usage_metadata={'input_tokens': 32, 'output_tokens': 11, 'total_tokens': 43, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})
查看 LangSmith 追蹤,我們可以看見在訊息傳遞到模型之前,它們首先被修剪:https://smith.langchain.com/public/65af12c4-c24d-4824-90f0-6547566e59bb/r
僅查看修剪器,我們可以看見它是一個 Runnable 物件,可以像所有 Runnables 一樣調用
trimmer.invoke(messages)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
與 ChatMessageHistory 一起使用
當使用聊天歷史記錄時,修剪訊息特別有用,因為聊天歷史記錄可能會變得任意長
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
chat_history = InMemoryChatMessageHistory(messages=messages[:-1])
def dummy_get_session_history(session_id):
if session_id != "1":
return InMemoryChatMessageHistory()
return chat_history
trimmer = trim_messages(
max_tokens=45,
strategy="last",
token_counter=llm,
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# start_on="human" makes sure we produce a valid chat history
start_on="human",
)
chain = trimmer | llm
chain_with_history = RunnableWithMessageHistory(chain, dummy_get_session_history)
chain_with_history.invoke(
[HumanMessage("what do you call a speechless parrot")],
config={"configurable": {"session_id": "1"}},
)
AIMessage(content='A "polygon"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 32, 'total_tokens': 36, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_c17d3befe7', 'finish_reason': 'stop', 'logprobs': None}, id='run-71d9fce6-bb0c-4bb3-acc8-d5eaee6ae7bc-0', usage_metadata={'input_tokens': 32, 'output_tokens': 4, 'total_tokens': 36})
查看 LangSmith 追蹤,我們可以看見我們檢索了所有訊息,但在訊息傳遞到模型之前,它們被修剪為僅包含系統訊息和最後一條人類訊息:https://smith.langchain.com/public/17dd700b-9994-44ca-930c-116e00997315/r
API 參考
有關所有參數的完整說明,請前往 API 參考:https://langchain-python.dev.org.tw/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html