如何從 LangSmith 資料集中選取範例

📚先決條件

📦相容性

本指南中的程式碼需要 langsmith>=0.1.101, langchain-core>=0.2.34。請確保您已安裝正確的套件。

LangSmith 資料集內建支援相似度搜尋，使其成為建立和查詢少量範例的絕佳工具。

在本指南中，我們將了解如何使用索引化的 LangSmith 資料集作為少量範例選擇器。

設定

在開始之前，請確保您已建立 LangSmith 帳戶並設定您的憑證

import getpass
import os

if not os.environ.get("LANGSMITH_API_KEY"):
    os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Set LangSmith API key:\n\n")

os.environ["LANGSMITH_TRACING"] = "true"

Set LangSmith API key:

········

我們需要安裝 langsmith SDK。在本範例中，我們也將使用 langchain、langchain-openai 和 langchain-benchmarks

%pip install -qU "langsmith>=0.1.101" "langchain-core>=0.2.34" langchain langchain-openai langchain-benchmarks

現在我們將複製一個公開資料集，並為該資料集開啟索引功能。我們也可以透過 LangSmith UI 開啟索引功能。

我們將複製多重宇宙數學少量範例資料集。

這會啟用在資料集中搜尋的功能，並確保我們在更新/新增範例時，它們也會被索引。

from langsmith import Client as LangSmith

ls_client = LangSmith()

dataset_name = "multiverse-math-few-shot-examples-v2"
dataset_public_url = (
    "https://smith.langchain.com/public/620596ee-570b-4d2b-8c8f-f828adbe5242/d"
)

ls_client.clone_public_dataset(dataset_public_url)

dataset_id = ls_client.read_dataset(dataset_name=dataset_name).id

ls_client.index_dataset(dataset_id=dataset_id)

查詢資料集

索引可能需要幾秒鐘。資料集索引完成後，我們可以搜尋相似的範例。請注意，similar_examples 方法的輸入必須具有與範例輸入相同的結構描述。在本例中，我們的範例輸入是一個字典，其中包含一個 "question" 鍵

examples = ls_client.similar_examples(
    {"question": "whats the negation of the negation of the negation of 3"},
    limit=3,
    dataset_id=dataset_id,
)
len(examples)

examples[0].inputs["question"]

'evaluate the negation of -100'

對於此資料集，輸出是在 OpenAI 訊息格式中問題之後的對話

examples[0].outputs["conversation"]

[{'role': 'assistant',
  'content': None,
  'tool_calls': [{'id': 'toolu_01HTpq4cYNUac6F7omUc2Wz3',
    'type': 'function',
    'function': {'name': 'negate', 'arguments': '{"a": -100}'}}]},
 {'role': 'tool',
  'content': '-100.0',
  'tool_call_id': 'toolu_01HTpq4cYNUac6F7omUc2Wz3'},
 {'role': 'assistant', 'content': 'So the answer is 100.'},
 {'role': 'user',
  'content': '100 is incorrect. Please refer to the output of your tool call.'},
 {'role': 'assistant',
  'content': [{'text': "You're right, my previous answer was incorrect. Let me re-evaluate using the tool output:",
    'type': 'text'}],
  'tool_calls': [{'id': 'toolu_01XsJQboYghGDygQpPjJkeRq',
    'type': 'function',
    'function': {'name': 'negate', 'arguments': '{"a": -100}'}}]},
 {'role': 'tool',
  'content': '-100.0',
  'tool_call_id': 'toolu_01XsJQboYghGDygQpPjJkeRq'},
 {'role': 'assistant', 'content': 'The answer is -100.0'},
 {'role': 'user',
  'content': 'You have the correct numerical answer but are returning additional text. Please only respond with the numerical answer.'},
 {'role': 'assistant', 'content': '-100.0'}]

建立動態少量提示

搜尋會傳回輸入與查詢輸入最相似的範例。我們可以將其用於少量提示模型，如下所示

from langchain.chat_models import init_chat_model
from langchain_benchmarks.tool_usage.tasks.multiverse_math import (
    add,
    cos,
    divide,
    log,
    multiply,
    negate,
    pi,
    power,
    sin,
    subtract,
)
from langchain_core.runnables import RunnableLambda
from langsmith import AsyncClient as AsyncLangSmith

async_ls_client = AsyncLangSmith()


def similar_examples(input_: dict) -> dict:
    examples = ls_client.similar_examples(input_, limit=5, dataset_id=dataset_id)
    return {**input_, "examples": examples}


async def asimilar_examples(input_: dict) -> dict:
    examples = await async_ls_client.similar_examples(
        input_, limit=5, dataset_id=dataset_id
    )
    return {**input_, "examples": examples}


def construct_prompt(input_: dict) -> list:
    instructions = """You are great at using mathematical tools."""
    examples = []
    for ex in input_["examples"]:
        examples.append({"role": "user", "content": ex.inputs["question"]})
        for msg in ex.outputs["conversation"]:
            if msg["role"] == "assistant":
                msg["name"] = "example_assistant"
            if msg["role"] == "user":
                msg["name"] = "example_user"
            examples.append(msg)
    return [
        {"role": "system", "content": instructions},
        *examples,
        {"role": "user", "content": input_["question"]},
    ]


tools = [add, cos, divide, log, multiply, negate, pi, power, sin, subtract]
llm = init_chat_model("gpt-4o-2024-08-06")
llm_with_tools = llm.bind_tools(tools)

example_selector = RunnableLambda(func=similar_examples, afunc=asimilar_examples)

chain = example_selector | construct_prompt | llm_with_tools

API 參考：init_chat_model | RunnableLambda

ai_msg = await chain.ainvoke({"question": "whats the negation of the negation of 3"})
ai_msg.tool_calls

[{'name': 'negate',
  'args': {'a': 3},
  'id': 'call_uMSdoTl6ehfHh5a6JQUb2NoZ',
  'type': 'tool_call'}]

查看 LangSmith 追蹤，我們可以看見相關範例已在 similar_examples 步驟中提取，並作為訊息傳遞給 ChatOpenAI：https://smith.langchain.com/public/9585e30f-765a-4ed9-b964-2211420cd2f8/r/fdea98d6-e90f-49d4-ac22-dfd012e9e0d9。

設定​

查詢資料集​

建立動態少量提示​

此頁面是否對您有幫助？

設定

查詢資料集

建立動態少量提示