如何僅使用提示 (不使用工具呼叫) 進行提取

工具呼叫功能並非從 LLM 產生結構化輸出所必需。能夠良好遵循提示指示的 LLM 可以被委託以給定格式輸出資訊。

此方法依賴於設計良好的提示，然後解析 LLM 的輸出，使其能夠良好地提取資訊。

若要在不使用工具呼叫功能的情況下提取資料

指示 LLM 產生遵循預期格式的文字 (例如，具有特定結構描述的 JSON)；
使用輸出解析器將模型回應結構化為所需的 Python 物件。

首先，我們選擇一個 LLM

pip install -qU "langchain[openai]"

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-4o-mini", model_provider="openai")

提示

本教學旨在簡潔明瞭，但通常應包含參考範例以發揮效能！

使用 PydanticOutputParser

以下範例使用內建的 PydanticOutputParser 來解析聊天模型的輸出。

from typing import List, Optional

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field, validator


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Set up a parser
parser = PydanticOutputParser(pydantic_object=People)

# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
        ),
        ("human", "{query}"),
    ]
).partial(format_instructions=parser.get_format_instructions())

API 參考：PydanticOutputParser | ChatPromptTemplate

讓我們看看發送到模型的資訊

query = "Anna is 23 years old and she is 6 feet tall"

print(prompt.format_prompt(query=query).to_string())

System: Answer the user query. Wrap the output in `json` tags
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
\`\`\`
{"$defs": {"Person": {"description": "Information about a person.", "properties": {"name": {"description": "The name of the person", "title": "Name", "type": "string"}, "height_in_meters": {"description": "The height of the person expressed in meters.", "title": "Height In Meters", "type": "number"}}, "required": ["name", "height_in_meters"], "title": "Person", "type": "object"}}, "description": "Identifying information about all people in a text.", "properties": {"people": {"items": {"$ref": "#/$defs/Person"}, "title": "People", "type": "array"}}, "required": ["people"]}
\`\`\`
Human: Anna is 23 years old and she is 6 feet tall

定義提示後，我們只需將提示、模型和輸出解析器鏈接在一起

chain = prompt | model | parser
chain.invoke({"query": query})

People(people=[Person(name='Anna', height_in_meters=1.83)])

查看相關的 Langsmith 追蹤。

請注意，結構描述顯示在兩個位置

在提示中，透過 parser.get_format_instructions()；
在鏈中，接收格式化的輸出並將其結構化為 Python 物件 (在本例中為 Pydantic 物件 People)。

自訂解析

如果需要，可以使用 LangChain 和 LCEL 輕鬆建立自訂提示和解析器。

若要建立自訂解析器，請定義一個函數來解析模型的輸出 (通常是 AIMessage) 為您選擇的物件。

請參閱下方 JSON 解析器的簡單實作。

import json
import re
from typing import List, Optional

from langchain_anthropic.chat_models import ChatAnthropic
from langchain_core.messages import AIMessage
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field, validator


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Output your answer as JSON that  "
            "matches the given schema: \`\`\`json\n{schema}\n\`\`\`. "
            "Make sure to wrap the answer in \`\`\`json and \`\`\` tags",
        ),
        ("human", "{query}"),
    ]
).partial(schema=People.schema())


# Custom parser
def extract_json(message: AIMessage) -> List[dict]:
    """Extracts JSON content from a string where JSON is embedded between \`\`\`json and \`\`\` tags.

    Parameters:
        text (str): The text containing the JSON content.

    Returns:
        list: A list of extracted JSON strings.
    """
    text = message.content
    # Define the regular expression pattern to match JSON blocks
    pattern = r"\`\`\`json(.*?)\`\`\`"

    # Find all non-overlapping matches of the pattern in the string
    matches = re.findall(pattern, text, re.DOTALL)

    # Return the list of matched JSON strings, stripping any leading or trailing whitespace
    try:
        return [json.loads(match.strip()) for match in matches]
    except Exception:
        raise ValueError(f"Failed to parse: {message}")

API 參考：ChatAnthropic | AIMessage | ChatPromptTemplate

query = "Anna is 23 years old and she is 6 feet tall"
print(prompt.format_prompt(query=query).to_string())

System: Answer the user query. Output your answer as JSON that  matches the given schema: \`\`\`json
{'$defs': {'Person': {'description': 'Information about a person.', 'properties': {'name': {'description': 'The name of the person', 'title': 'Name', 'type': 'string'}, 'height_in_meters': {'description': 'The height of the person expressed in meters.', 'title': 'Height In Meters', 'type': 'number'}}, 'required': ['name', 'height_in_meters'], 'title': 'Person', 'type': 'object'}}, 'description': 'Identifying information about all people in a text.', 'properties': {'people': {'items': {'$ref': '#/$defs/Person'}, 'title': 'People', 'type': 'array'}}, 'required': ['people'], 'title': 'People', 'type': 'object'}
\`\`\`. Make sure to wrap the answer in \`\`\`json and \`\`\` tags
Human: Anna is 23 years old and she is 6 feet tall

chain = prompt | model | extract_json
chain.invoke({"query": query})

/Users/bagatur/langchain/.venv/lib/python3.11/site-packages/pydantic/_internal/_fields.py:201: UserWarning: Field name "schema" in "PromptInput" shadows an attribute in parent "BaseModel"
  warnings.warn(

[{'people': [{'name': 'Anna', 'height_in_meters': 1.83}]}]

其他函式庫

如果您正在尋找使用解析方法進行提取，請查看 Kor 函式庫。它由 LangChain 維護者之一編寫，它有助於製作考慮到範例的提示，允許控制格式 (例如 JSON 或 CSV) 並以 TypeScript 表示結構描述。它似乎運作良好！

使用 PydanticOutputParser​

自訂解析​

其他函式庫​

此頁面是否對您有幫助？

使用 PydanticOutputParser

自訂解析

其他函式庫