跳到主要內容
Open In ColabOpen on GitHub

如何在 ChatModels 中追蹤 Token 使用量

先決條件

本指南假設您熟悉以下概念

追蹤 token 使用量以計算成本是將您的應用程式投入生產的重要環節。本指南將介紹如何從您的 LangChain 模型調用中取得此資訊。

本指南需要 langchain-anthropiclangchain-openai >= 0.1.9

%pip install -qU langchain-anthropic langchain-openai
關於 OpenAI 串流的注意事項

OpenAI 的 Chat Completions API 預設不會串流 token 使用量統計資訊(請參閱 API 參考 此處)。若要在使用 ChatOpenAI 串流時恢復 token 計數,請依照本指南中的示範,將 stream_usage=True 設定為 True。

對於 AzureChatOpenAI,在呼叫 .(a)stream 時設定 stream_options={"include_usage": True},或使用以下方式初始化

AzureChatOpenAI(
...,
model_kwargs={"stream_options": {"include_usage": True}},
)

使用 LangSmith

您可以使用 LangSmith 來協助追蹤 LLM 應用程式中的 token 使用量。請參閱 LangSmith 快速入門指南

使用 AIMessage.usage_metadata

許多模型提供者會將 token 使用量資訊作為聊天生成回應的一部分傳回。如果可用,此資訊將包含在對應模型產生的 AIMessage 物件中。

LangChain AIMessage 物件包含 usage_metadata 屬性。當此屬性被填充時,它將會是一個 UsageMetadata 字典,其中包含標準鍵(例如,"input_tokens""output_tokens")。

範例

OpenAI:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
openai_response = llm.invoke("hello")
openai_response.usage_metadata
API 參考:ChatOpenAI
{'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}

Anthropic:

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-haiku-20240307")
anthropic_response = llm.invoke("hello")
anthropic_response.usage_metadata
API 參考:ChatAnthropic
{'input_tokens': 8, 'output_tokens': 12, 'total_tokens': 20}

使用 AIMessage.response_metadata

來自模型回應的元數據也包含在 AIMessage 的 response_metadata 屬性中。這些資料通常未標準化。請注意,不同的提供者對於表示 token 計數採用不同的慣例

print(f'OpenAI: {openai_response.response_metadata["token_usage"]}\n')
print(f'Anthropic: {anthropic_response.response_metadata["usage"]}')
OpenAI: {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}

Anthropic: {'input_tokens': 8, 'output_tokens': 12}

串流

某些提供者在串流情境中支援 token 計數元數據。

OpenAI

例如,OpenAI 將在串流結束時傳回一個訊息 chunk,其中包含 token 使用量資訊。langchain-openai >= 0.1.9 支援此行為,並且可以透過設定 stream_usage=True 來啟用。當 ChatOpenAI 實例化時,也可以設定此屬性。

注意

預設情況下,串流中的最後一個訊息 chunk 將在訊息的 response_metadata 屬性中包含 "finish_reason"。如果我們在串流模式中包含 token 使用量,則會將包含使用量元數據的額外 chunk 新增到串流的末尾,這樣 "finish_reason" 就會出現在倒數第二個訊息 chunk 中。

llm = ChatOpenAI(model="gpt-4o-mini")

aggregate = None
for chunk in llm.stream("hello", stream_usage=True):
print(chunk)
aggregate = chunk if aggregate is None else aggregate + chunk
content='' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='Hello' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='!' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' How' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' can' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' I' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' assist' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' you' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' today' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='?' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='' response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-mini'} id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623' usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}

請注意,使用量元數據將包含在個別訊息 chunk 的總和中

print(aggregate.content)
print(aggregate.usage_metadata)
Hello! How can I assist you today?
{'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}

若要停用 OpenAI 的串流 token 計數,請將 stream_usage 設定為 False,或從參數中省略它

aggregate = None
for chunk in llm.stream("hello"):
print(chunk)
content='' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='Hello' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='!' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' How' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' can' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' I' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' assist' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' you' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' today' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='?' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='' response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-mini'} id='run-8e758550-94b0-4cca-a298-57482793c25d'

您也可以在實例化聊天模型時設定 stream_usage 來啟用串流 token 使用量。當將聊天模型整合到 LangChain 中時,這會很有用:當 串流中間步驟或使用追蹤軟體(例如 LangSmith)時,可以監控使用量元數據。

請參閱以下範例,我們在其中傳回符合所需架構的結構化輸出,但仍然可以觀察從中間步驟串流的 token 使用量。

from pydantic import BaseModel, Field


class Joke(BaseModel):
"""Joke to tell user."""

setup: str = Field(description="question to set up a joke")
punchline: str = Field(description="answer to resolve the joke")


llm = ChatOpenAI(
model="gpt-4o-mini",
stream_usage=True,
)
# Under the hood, .with_structured_output binds tools to the
# chat model and appends a parser.
structured_llm = llm.with_structured_output(Joke)

async for event in structured_llm.astream_events("Tell me a joke", version="v2"):
if event["event"] == "on_chat_model_end":
print(f'Token usage: {event["data"]["output"].usage_metadata}\n')
elif event["event"] == "on_chain_end":
print(event["data"]["output"])
else:
pass
Token usage: {'input_tokens': 79, 'output_tokens': 23, 'total_tokens': 102}

setup='Why was the math book sad?' punchline='Because it had too many problems.'

Token 使用量在對應的 LangSmith 追蹤中,從聊天模型的 payload 中也是可見的。

使用回呼

還有一些 API 特定的回呼情境管理器,可讓您跨多個呼叫追蹤 token 使用量。它們目前僅針對 OpenAI API 和 Bedrock Anthropic API 實作,並且在 langchain-community 中可用

%pip install -qU langchain-community

OpenAI

讓我們先來看一個非常簡單的範例,說明如何追蹤單次聊天模型呼叫的 token 使用量。

from langchain_community.callbacks.manager import get_openai_callback

llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
stream_usage=True,
)

with get_openai_callback() as cb:
result = llm.invoke("Tell me a joke")
print(cb)
API 參考:get_openai_callback
Tokens Used: 27
Prompt Tokens: 11
Completion Tokens: 16
Successful Requests: 1
Total Cost (USD): $2.95e-05

情境管理器內部的任何內容都將被追蹤。以下是一個使用它來追蹤序列中多個呼叫的範例。

with get_openai_callback() as cb:
result = llm.invoke("Tell me a joke")
result2 = llm.invoke("Tell me a joke")
print(cb.total_tokens)
54
with get_openai_callback() as cb:
for chunk in llm.stream("Tell me a joke"):
pass
print(cb)
Tokens Used: 27
Prompt Tokens: 11
Completion Tokens: 16
Successful Requests: 1
Total Cost (USD): $2.95e-05

如果使用包含多個步驟的鏈或代理,它將追蹤所有這些步驟。

%pip install -qU langchain langchain-aws wikipedia
from langchain.agents import AgentExecutor, create_tool_calling_agent, load_tools
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
("system", "You're a helpful assistant"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
]
)
tools = load_tools(["wikipedia"])
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
with get_openai_callback() as cb:
response = agent_executor.invoke(
{
"input": "What's a hummingbird's scientific name and what's the fastest bird species?"
}
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")


> Entering new AgentExecutor chain...

Invoking: `wikipedia` with `{'query': 'hummingbird scientific name'}`


Page: Hummingbird
Summary: Hummingbirds are birds native to the Americas and comprise the biological family Trochilidae. With approximately 366 species and 113 genera, they occur from Alaska to Tierra del Fuego, but most species are found in Central and South America. As of 2024, 21 hummingbird species are listed as endangered or critically endangered, with numerous species declining in population.
Hummingbirds have varied specialized characteristics to enable rapid, maneuverable flight: exceptional metabolic capacity, adaptations to high altitude, sensitive visual and communication abilities, and long-distance migration in some species. Among all birds, male hummingbirds have the widest diversity of plumage color, particularly in blues, greens, and purples. Hummingbirds are the smallest mature birds, measuring 7.5–13 cm (3–5 in) in length. The smallest is the 5 cm (2.0 in) bee hummingbird, which weighs less than 2.0 g (0.07 oz), and the largest is the 23 cm (9 in) giant hummingbird, weighing 18–24 grams (0.63–0.85 oz). Noted for long beaks, hummingbirds are specialized for feeding on flower nectar, but all species also consume small insects.
They are known as hummingbirds because of the humming sound created by their beating wings, which flap at high frequencies audible to other birds and humans. They hover at rapid wing-flapping rates, which vary from around 12 beats per second in the largest species to 80 per second in small hummingbirds.
Hummingbirds have the highest mass-specific metabolic rate of any homeothermic animal. To conserve energy when food is scarce and at night when not foraging, they can enter torpor, a state similar to hibernation, and slow their metabolic rate to 1⁄15 of its normal rate. While most hummingbirds do not migrate, the rufous hummingbird has one of the longest migrations among birds, traveling twice per year between Alaska and Mexico, a distance of about 3,900 miles (6,300 km).
Hummingbirds split from their sister group, the swifts and treeswifts, around 42 million years ago. The oldest known fossil hummingbird is Eurotrochilus, from the Rupelian Stage of Early Oligocene Europe.

Page: Rufous hummingbird
Summary: The rufous hummingbird (Selasphorus rufus) is a small hummingbird, about 8 cm (3.1 in) long with a long, straight and slender bill. These birds are known for their extraordinary flight skills, flying 2,000 mi (3,200 km) during their migratory transits. It is one of nine species in the genus Selasphorus.



Page: Allen's hummingbird
Summary: Allen's hummingbird (Selasphorus sasin) is a species of hummingbird that breeds in the western United States. It is one of seven species in the genus Selasphorus.
Invoking: `wikipedia` with `{'query': 'fastest bird species'}`


Page: List of birds by flight speed
Summary: This is a list of the fastest flying birds in the world. A bird's velocity is necessarily variable; a hunting bird will reach much greater speeds while diving to catch prey than when flying horizontally. The bird that can achieve the greatest airspeed is the peregrine falcon (Falco peregrinus), able to exceed 320 km/h (200 mph) in its dives. A close relative of the common swift, the white-throated needletail (Hirundapus caudacutus), is commonly reported as the fastest bird in level flight with a reported top speed of 169 km/h (105 mph). This record remains unconfirmed as the measurement methods have never been published or verified. The record for the fastest confirmed level flight by a bird is 111.5 km/h (69.3 mph) held by the common swift.

Page: Fastest animals
Summary: This is a list of the fastest animals in the world, by types of animal.

Page: Falcon
Summary: Falcons () are birds of prey in the genus Falco, which includes about 40 species. Falcons are widely distributed on all continents of the world except Antarctica, though closely related raptors did occur there in the Eocene.
Adult falcons have thin, tapered wings, which enable them to fly at high speed and change direction rapidly. Fledgling falcons, in their first year of flying, have longer flight feathers, which make their configuration more like that of a general-purpose bird such as a broad wing. This makes flying easier while learning the exceptional skills required to be effective hunters as adults.
The falcons are the largest genus in the Falconinae subfamily of Falconidae, which itself also includes another subfamily comprising caracaras and a few other species. All these birds kill with their beaks, using a tomial "tooth" on the side of their beaks—unlike the hawks, eagles, and other birds of prey in the Accipitridae, which use their feet.
The largest falcon is the gyrfalcon at up to 65 cm in length. The smallest falcon species is the pygmy falcon, which measures just 20 cm. As with hawks and owls, falcons exhibit sexual dimorphism, with the females typically larger than the males, thus allowing a wider range of prey species.
Some small falcons with long, narrow wings are called "hobbies" and some which hover while hunting are called "kestrels".
As is the case with many birds of prey, falcons have exceptional powers of vision; the visual acuity of one species has been measured at 2.6 times that of a normal human. Peregrine falcons have been recorded diving at speeds of 320 km/h (200 mph), making them the fastest-moving creatures on Earth; the fastest recorded dive attained a vertical speed of 390 km/h (240 mph).The scientific name for a hummingbird is Trochilidae. The fastest bird species in level flight is the common swift, which holds the record for the fastest confirmed level flight by a bird at 111.5 km/h (69.3 mph). The peregrine falcon is known to exceed speeds of 320 km/h (200 mph) in its dives, making it the fastest bird in terms of diving speed.

> Finished chain.
Total Tokens: 1675
Prompt Tokens: 1538
Completion Tokens: 137
Total Cost (USD): $0.0009745000000000001

Bedrock Anthropic

get_bedrock_anthropic_callback 的運作方式非常相似

from langchain_aws import ChatBedrock
from langchain_community.callbacks.manager import get_bedrock_anthropic_callback

llm = ChatBedrock(model_id="anthropic.claude-v2")

with get_bedrock_anthropic_callback() as cb:
result = llm.invoke("Tell me a joke")
result2 = llm.invoke("Tell me a joke")
print(cb)
Tokens Used: 96
Prompt Tokens: 26
Completion Tokens: 70
Successful Requests: 2
Total Cost (USD): $0.001888

後續步驟

您現在已看到一些範例,說明如何追蹤支援提供者的 token 使用量。

接下來,查看本節中其他關於聊天模型的操作指南,例如如何讓模型傳回結構化輸出如何為您的聊天模型新增快取


此頁面是否有幫助?