如何為可執行物件新增後備方案

當使用語言模型時，您可能經常遇到來自底層 API 的問題，無論是速率限制還是停機。因此，當您將 LLM 應用程式投入生產環境時，防範這些問題變得越來越重要。這就是為什麼我們引入了後備方案的概念。

後備方案是在緊急情況下可以使用的替代計畫。

至關重要的是，後備方案不僅可以應用於 LLM 層級，還可以應用於整個可執行物件層級。這很重要，因為通常不同的模型需要不同的提示。因此，如果對 OpenAI 的呼叫失敗，您不僅希望將相同的提示發送到 Anthropic，您可能還希望使用不同的提示範本並在那裡發送不同的版本。

LLM API 錯誤的後備方案

這可能是後備方案最常見的用例。對 LLM API 的請求可能會因多種原因而失敗 - API 可能已關閉、您可能已達到速率限制，以及任何數量的其他情況。因此，使用後備方案可以幫助防止這些類型的問題。

重要事項：預設情況下，許多 LLM 包裝器會捕獲錯誤並重試。在使用後備方案時，您很可能需要關閉這些重試。否則，第一個包裝器將繼續重試而不會失敗。

%pip install --upgrade --quiet  langchain langchain-openai

from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

API 參考：ChatAnthropic | ChatOpenAI

首先，讓我們模擬如果我們從 OpenAI 遇到 RateLimitError 會發生什麼情況

from unittest.mock import patch

import httpx
from openai import RateLimitError

request = httpx.Request("GET", "/")
response = httpx.Response(200, request=request)
error = RateLimitError("rate limit", response=response, body="")

# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc
openai_llm = ChatOpenAI(model="gpt-4o-mini", max_retries=0)
anthropic_llm = ChatAnthropic(model="claude-3-haiku-20240307")
llm = openai_llm.with_fallbacks([anthropic_llm])

# Let's use just the OpenAI LLm first, to show that we run into an error
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(openai_llm.invoke("Why did the chicken cross the road?"))
    except RateLimitError:
        print("Hit error")

Hit error

# Now let's try with fallbacks to Anthropic
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(llm.invoke("Why did the chicken cross the road?"))
    except RateLimitError:
        print("Hit error")

content=' I don\'t actually know why the chicken crossed the road, but here are some possible humorous answers:\n\n- To get to the other side!\n\n- It was too chicken to just stand there. \n\n- It wanted a change of scenery.\n\n- It wanted to show the possum it could be done.\n\n- It was on its way to a poultry farmers\' convention.\n\nThe joke plays on the double meaning of "the other side" - literally crossing the road to the other side, or the "other side" meaning the afterlife. So it\'s an anti-joke, with a silly or unexpected pun as the answer.' additional_kwargs={} example=False

我們可以像使用普通 LLM 一樣使用我們的「具有後備方案的 LLM」。

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a nice assistant who always includes a compliment in your response",
        ),
        ("human", "Why did the {animal} cross the road"),
    ]
)
chain = prompt | llm
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(chain.invoke({"animal": "kangaroo"}))
    except RateLimitError:
        print("Hit error")

API 參考：ChatPromptTemplate

content=" I don't actually know why the kangaroo crossed the road, but I can take a guess! Here are some possible reasons:\n\n- To get to the other side (the classic joke answer!)\n\n- It was trying to find some food or water \n\n- It was trying to find a mate during mating season\n\n- It was fleeing from a predator or perceived threat\n\n- It was disoriented and crossed accidentally \n\n- It was following a herd of other kangaroos who were crossing\n\n- It wanted a change of scenery or environment \n\n- It was trying to reach a new habitat or territory\n\nThe real reason is unknown without more context, but hopefully one of those potential explanations does the joke justice! Let me know if you have any other animal jokes I can try to decipher." additional_kwargs={} example=False

序列的後備方案

我們還可以為序列建立後備方案，這些後備方案本身就是序列。在這裡，我們使用兩個不同的模型來做到這一點：ChatOpenAI，然後是普通的 OpenAI（不使用聊天模型）。由於 OpenAI 不是聊天模型，因此您可能需要不同的提示。

# First let's create a chain with a ChatModel
# We add in a string output parser here so the outputs between the two are the same type
from langchain_core.output_parsers import StrOutputParser

chat_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a nice assistant who always includes a compliment in your response",
        ),
        ("human", "Why did the {animal} cross the road"),
    ]
)
# Here we're going to use a bad model name to easily create a chain that will error
chat_model = ChatOpenAI(model="gpt-fake")
bad_chain = chat_prompt | chat_model | StrOutputParser()

API 參考：StrOutputParser

# Now lets create a chain with the normal OpenAI model
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

prompt_template = """Instructions: You should always include a compliment in your response.

Question: Why did the {animal} cross the road?"""
prompt = PromptTemplate.from_template(prompt_template)
llm = OpenAI()
good_chain = prompt | llm

API 參考：PromptTemplate | OpenAI

# We can now create a final chain which combines the two
chain = bad_chain.with_fallbacks([good_chain])
chain.invoke({"animal": "turtle"})

'\n\nAnswer: The turtle crossed the road to get to the other side, and I have to say he had some impressive determination.'

長輸入的後備方案

LLM 的主要限制因素之一是其上下文窗口。通常，您可以在將提示發送到 LLM 之前計算和追蹤提示的長度，但在難以/複雜的情況下，您可以後退到具有更長上下文長度的模型。

short_llm = ChatOpenAI()
long_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
llm = short_llm.with_fallbacks([long_llm])

inputs = "What is the next number: " + ", ".join(["one", "two"] * 3000)

try:
    print(short_llm.invoke(inputs))
except Exception as e:
    print(e)

This model's maximum context length is 4097 tokens. However, your messages resulted in 12012 tokens. Please reduce the length of the messages.

try:
    print(llm.invoke(inputs))
except Exception as e:
    print(e)

content='The next number in the sequence is two.' additional_kwargs={} example=False

更好的模型的後備方案

通常，我們要求模型以特定格式（如 JSON）輸出格式。像 GPT-3.5 這樣的模型可以做到這一點，但有時會遇到困難。這自然指向後備方案 - 我們可以嘗試使用 GPT-3.5（更快、更便宜），但如果解析失敗，我們可以使用 GPT-4。

from langchain.output_parsers import DatetimeOutputParser

API 參考：DatetimeOutputParser

prompt = ChatPromptTemplate.from_template(
    "what time was {event} (in %Y-%m-%dT%H:%M:%S.%fZ format - only return this value)"
)

# In this case we are going to do the fallbacks on the LLM + output parser level
# Because the error will get raised in the OutputParser
openai_35 = ChatOpenAI() | DatetimeOutputParser()
openai_4 = ChatOpenAI(model="gpt-4") | DatetimeOutputParser()

only_35 = prompt | openai_35
fallback_4 = prompt | openai_35.with_fallbacks([openai_4])

try:
    print(only_35.invoke({"event": "the superbowl in 1994"}))
except Exception as e:
    print(f"Error: {e}")

Error: Could not parse datetime string: The Super Bowl in 1994 took place on January 30th at 3:30 PM local time. Converting this to the specified format (%Y-%m-%dT%H:%M:%S.%fZ) results in: 1994-01-30T15:30:00.000Z

try:
    print(fallback_4.invoke({"event": "the superbowl in 1994"}))
except Exception as e:
    print(f"Error: {e}")

1994-01-30 15:30:00

LLM API 錯誤的後備方案​

序列的後備方案​

長輸入的後備方案​

更好的模型的後備方案​

此頁面是否對您有幫助？

LLM API 錯誤的後備方案

序列的後備方案

長輸入的後備方案

更好的模型的後備方案