跳到主要內容

如何修剪訊息

先決條件

本指南假設您熟悉以下概念

本指南中的方法也需要 langchain-core>=0.2.9

所有模型都有有限的上下文窗口,這表示它們可以作為輸入的 tokens 數量有限制。如果您有非常長的訊息或累積了長訊息歷史記錄的鏈/代理程式,則需要管理傳遞到模型中的訊息長度。

trim_messages 可用於將聊天歷史記錄的大小縮減為指定的 token 數量或指定的訊息數量。

如果將修剪後的聊天歷史記錄直接傳遞回聊天模型,則修剪後的聊天歷史記錄應滿足以下屬性

  1. 產生的聊天歷史記錄應為有效。通常,這表示應滿足以下屬性
    • 聊天歷史記錄開始於 (1) HumanMessage 或 (2) SystemMessage 後面接著 HumanMessage
    • 聊天歷史記錄結束HumanMessageToolMessage
    • ToolMessage 只能在涉及工具呼叫的 AIMessage 之後出現。這可以透過設定 start_on="human"ends_on=("human", "tool") 來達成。
  2. 它包含最近的訊息,並捨棄聊天歷史記錄中的舊訊息。這可以透過設定 strategy="last" 來達成。
  3. 通常,新的聊天歷史記錄應包含 SystemMessage (如果原始聊天歷史記錄中存在),因為 SystemMessage 包含聊天模型的特殊指示。SystemMessage 幾乎總是歷史記錄中的第一條訊息 (如果存在)。這可以透過設定 include_system=True 來達成。

基於 token 數量修剪

在這裡,我們將基於 token 數量修剪聊天歷史記錄。修剪後的聊天歷史記錄將產生一個有效的聊天歷史記錄,其中包含 SystemMessage

為了保留最新的訊息,我們設定 strategy="last"。我們也將設定 include_system=True 以包含 SystemMessage,並設定 start_on="human" 以確保產生的聊天歷史記錄有效。

當基於 token 數量使用 trim_messages 時,這是一個良好的預設配置。請記住根據您的用例調整 token_countermax_tokens

請注意,對於我們的 token_counter,我們可以傳遞一個函式 (稍後會詳細介紹) 或一個語言模型 (因為語言模型具有訊息 token 計數方法)。當您修剪訊息以適應特定模型的上下文窗口時,傳遞模型是有意義的

pip install -qU langchain-openai
Note: you may need to restart the kernel to use updated packages.
from langchain_core.messages import (
AIMessage,
HumanMessage,
SystemMessage,
ToolMessage,
trim_messages,
)
from langchain_openai import ChatOpenAI

messages = [
SystemMessage("you're a good assistant, you always respond with a joke."),
HumanMessage("i wonder why it's called langchain"),
AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
HumanMessage("and who is harrison chasing anyways"),
AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
HumanMessage("what do you call a speechless parrot"),
]


trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# Remember to adjust based on your model
# or else pass a custom token_encoder
token_counter=ChatOpenAI(model="gpt-4o"),
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# Remember to adjust based on the desired conversation
# length
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
allow_partial=False,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

基於訊息數量修剪

或者,我們可以透過設定 token_counter=len,基於訊息數量修剪聊天歷史記錄。在這種情況下,每條訊息將計為單個 token,而 max_tokens 將控制訊息的最大數量。

當基於訊息數量使用 trim_messages 時,這是一個良好的預設配置。請記住根據您的用例調整 max_tokens

trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
token_counter=len,
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=5,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),
AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

進階用法

您可以使用 trim_message 作為建構塊來建立更複雜的處理邏輯。

如果我們想要允許分割訊息的內容,我們可以指定 allow_partial=True

trim_messages(
messages,
max_tokens=56,
strategy="last",
token_counter=ChatOpenAI(model="gpt-4o"),
include_system=True,
allow_partial=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
AIMessage(content="\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

預設情況下,SystemMessage 不會包含在內,因此您可以透過設定 include_system=False 或捨棄 include_system 參數來捨棄它。

trim_messages(
messages,
max_tokens=45,
strategy="last",
token_counter=ChatOpenAI(model="gpt-4o"),
)
[AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

我們可以透過指定 strategy="first" 來執行取得第一個 max_tokens 的翻轉操作

trim_messages(
messages,
max_tokens=45,
strategy="first",
token_counter=ChatOpenAI(model="gpt-4o"),
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]

編寫自訂 token 計數器

我們可以編寫一個自訂 token 計數器函式,該函式接收訊息列表並返回一個整數。

pip install -qU tiktoken
Note: you may need to restart the kernel to use updated packages.
from typing import List

import tiktoken
from langchain_core.messages import BaseMessage, ToolMessage


def str_token_counter(text: str) -> int:
enc = tiktoken.get_encoding("o200k_base")
return len(enc.encode(text))


def tiktoken_counter(messages: List[BaseMessage]) -> int:
"""Approximately reproduce https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

For simplicity only supports str Message.contents.
"""
num_tokens = 3 # every reply is primed with <|start|>assistant<|message|>
tokens_per_message = 3
tokens_per_name = 1
for msg in messages:
if isinstance(msg, HumanMessage):
role = "user"
elif isinstance(msg, AIMessage):
role = "assistant"
elif isinstance(msg, ToolMessage):
role = "tool"
elif isinstance(msg, SystemMessage):
role = "system"
else:
raise ValueError(f"Unsupported messages type {msg.__class__}")
num_tokens += (
tokens_per_message
+ str_token_counter(role)
+ str_token_counter(msg.content)
)
if msg.name:
num_tokens += tokens_per_name + str_token_counter(msg.name)
return num_tokens


trim_messages(
messages,
token_counter=tiktoken_counter,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
API 參考:BaseMessage | ToolMessage
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

鏈接

trim_messages 可以命令式 (如上例) 或宣告式使用,使其易於與鏈中的其他組件組合

llm = ChatOpenAI(model="gpt-4o")

# Notice we don't pass in messages. This creates
# a RunnableLambda that takes messages as input
trimmer = trim_messages(
token_counter=llm,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)

chain = trimmer | llm
chain.invoke(messages)
AIMessage(content='A polygon! Because it\'s a "poly-gone" quiet!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 32, 'total_tokens': 45, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_057232b607', 'finish_reason': 'stop', 'logprobs': None}, id='run-4fa026e7-9137-4fef-b596-54243615e3b3-0', usage_metadata={'input_tokens': 32, 'output_tokens': 13, 'total_tokens': 45})

查看 LangSmith 追蹤,我們可以發現訊息在傳遞到模型之前,會先被修剪:https://smith.langchain.com/public/65af12c4-c24d-4824-90f0-6547566e59bb/r

僅查看修剪器,我們可以發現它是一個 Runnable 物件,可以像所有 Runnables 一樣調用

trimmer.invoke(messages)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]

與 ChatMessageHistory 一起使用

使用聊天歷史記錄時,修剪訊息特別有用,聊天歷史記錄可能會變得任意長

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

chat_history = InMemoryChatMessageHistory(messages=messages[:-1])


def dummy_get_session_history(session_id):
if session_id != "1":
return InMemoryChatMessageHistory()
return chat_history


llm = ChatOpenAI(model="gpt-4o")

trimmer = trim_messages(
max_tokens=45,
strategy="last",
token_counter=llm,
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# start_on="human" makes sure we produce a valid chat history
start_on="human",
)

chain = trimmer | llm
chain_with_history = RunnableWithMessageHistory(chain, dummy_get_session_history)
chain_with_history.invoke(
[HumanMessage("what do you call a speechless parrot")],
config={"configurable": {"session_id": "1"}},
)
AIMessage(content='A "polygon"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 32, 'total_tokens': 36, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_c17d3befe7', 'finish_reason': 'stop', 'logprobs': None}, id='run-71d9fce6-bb0c-4bb3-acc8-d5eaee6ae7bc-0', usage_metadata={'input_tokens': 32, 'output_tokens': 4, 'total_tokens': 36})

查看 LangSmith 追蹤,我們可以發現我們檢索了所有訊息,但在訊息傳遞到模型之前,它們被修剪為僅包含系統訊息和最後一條人類訊息:https://smith.langchain.com/public/17dd700b-9994-44ca-930c-116e00997315/r

API 參考

如需所有參數的完整說明,請前往 API 參考:https://langchain-python.dev.org.tw/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html


此頁面是否有幫助?