Argilla

Argilla 是一個用於 LLM 的開源數據策展平台。透過 Argilla，每個人都可以透過更快的数据策展，使用人工和機器回饋來構建穩健的語言模型。我們為 MLOps 週期的每個步驟提供支援，從數據標記到模型監控。

在本指南中，我們將示範如何追蹤 LLM 的輸入和回應，以使用 ArgillaCallbackHandler 在 Argilla 中生成數據集。

追蹤 LLM 的輸入和輸出以生成用於未來微調的數據集非常有用。當您使用 LLM 生成特定任務的數據時，例如問答、摘要或翻譯，這尤其有用。

安裝與設定

%pip install --upgrade --quiet  langchain langchain-openai argilla

取得 API 憑證

若要取得 Argilla API 憑證，請依照下列步驟操作

前往您的 Argilla UI。
點擊您的個人資料圖片，然後前往「我的設定」。
然後複製 API 金鑰。

在 Argilla 中，API URL 將與您的 Argilla UI 的 URL 相同。

若要取得 OpenAI API 憑證，請造訪 https://platform.openai.com/account/api-keys

import os

os.environ["ARGILLA_API_URL"] = "..."
os.environ["ARGILLA_API_KEY"] = "..."

os.environ["OPENAI_API_KEY"] = "..."

設定 Argilla

若要使用 ArgillaCallbackHandler，我們需要在 Argilla 中建立新的 FeedbackDataset，以追蹤您的 LLM 實驗。若要執行此操作，請使用以下程式碼

import argilla as rg

from packaging.version import parse as parse_version

if parse_version(rg.__version__) < parse_version("1.8.0"):
    raise RuntimeError(
        "`FeedbackDataset` is only available in Argilla v1.8.0 or higher, please "
        "upgrade `argilla` as `pip install argilla --upgrade`."
    )

dataset = rg.FeedbackDataset(
    fields=[
        rg.TextField(name="prompt"),
        rg.TextField(name="response"),
    ],
    questions=[
        rg.RatingQuestion(
            name="response-rating",
            description="How would you rate the quality of the response?",
            values=[1, 2, 3, 4, 5],
            required=True,
        ),
        rg.TextQuestion(
            name="response-feedback",
            description="What feedback do you have for the response?",
            required=False,
        ),
    ],
    guidelines="You're asked to rate the quality of the response and provide feedback.",
)

rg.init(
    api_url=os.environ["ARGILLA_API_URL"],
    api_key=os.environ["ARGILLA_API_KEY"],
)

dataset.push_to_argilla("langchain-dataset")

📌 注意：目前，僅支援提示-回應配對作為 FeedbackDataset.fields，因此 ArgillaCallbackHandler 將僅追蹤提示，即 LLM 輸入，以及回應，即 LLM 輸出。

追蹤

若要使用 ArgillaCallbackHandler，您可以選擇使用以下程式碼，或僅重現以下章節中呈現的範例之一。

from langchain_community.callbacks.argilla_callback import ArgillaCallbackHandler

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url=os.environ["ARGILLA_API_URL"],
    api_key=os.environ["ARGILLA_API_KEY"],
)

API 參考文檔：ArgillaCallbackHandler

情境 1：追蹤 LLM

首先，讓我們執行單個 LLM 幾次，並在 Argilla 中捕獲產生的提示-回應配對。

from langchain_core.callbacks.stdout import StdOutCallbackHandler
from langchain_openai import OpenAI

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url=os.environ["ARGILLA_API_URL"],
    api_key=os.environ["ARGILLA_API_KEY"],
)
callbacks = [StdOutCallbackHandler(), argilla_callback]

llm = OpenAI(temperature=0.9, callbacks=callbacks)
llm.generate(["Tell me a joke", "Tell me a poem"] * 3)

API 參考文檔：StdOutCallbackHandler | OpenAI

LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when he hit the wall? \nA: Dam.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nThe Moon \n\nThe moon is high in the midnight sky,\nSparkling like a star above.\nThe night so peaceful, so serene,\nFilling up the air with love.\n\nEver changing and renewing,\nA never-ending light of grace.\nThe moon remains a constant view,\nA reminder of life’s gentle pace.\n\nThrough time and space it guides us on,\nA never-fading beacon of hope.\nThe moon shines down on us all,\nAs it continues to rise and elope.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nQ. What did one magnet say to the other magnet?\nA. "I find you very attractive!"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text="\n\nThe world is charged with the grandeur of God.\nIt will flame out, like shining from shook foil;\nIt gathers to a greatness, like the ooze of oil\nCrushed. Why do men then now not reck his rod?\n\nGenerations have trod, have trod, have trod;\nAnd all is seared with trade; bleared, smeared with toil;\nAnd wears man's smudge and shares man's smell: the soil\nIs bare now, nor can foot feel, being shod.\n\nAnd for all this, nature is never spent;\nThere lives the dearest freshness deep down things;\nAnd though the last lights off the black West went\nOh, morning, at the brown brink eastward, springs —\n\nBecause the Holy Ghost over the bent\nWorld broods with warm breast and with ah! bright wings.\n\n~Gerard Manley Hopkins", generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\n\nQ: What did one ocean say to the other ocean?\nA: Nothing, they just waved.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text="\n\nA poem for you\n\nOn a field of green\n\nThe sky so blue\n\nA gentle breeze, the sun above\n\nA beautiful world, for us to love\n\nLife is a journey, full of surprise\n\nFull of joy and full of surprise\n\nBe brave and take small steps\n\nThe future will be revealed with depth\n\nIn the morning, when dawn arrives\n\nA fresh start, no reason to hide\n\nSomewhere down the road, there's a heart that beats\n\nBelieve in yourself, you'll always succeed.", generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'completion_tokens': 504, 'total_tokens': 528, 'prompt_tokens': 24}, 'model_name': 'text-davinci-003'})

Argilla UI with LangChain LLM input-response

情境 2：在鏈中追蹤 LLM

然後，我們可以使用提示範本建立鏈，然後在 Argilla 中追蹤初始提示和最終回應。

from langchain.chains import LLMChain
from langchain_core.callbacks.stdout import StdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url=os.environ["ARGILLA_API_URL"],
    api_key=os.environ["ARGILLA_API_KEY"],
)
callbacks = [StdOutCallbackHandler(), argilla_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks)

template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)

test_prompts = [{"title": "Documentary about Bigfoot in Paris"}]
synopsis_chain.apply(test_prompts)

API 參考文檔：LLMChain | StdOutCallbackHandler | PromptTemplate | OpenAI

[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: Documentary about Bigfoot in Paris
Playwright: This is a synopsis for the above play:[0m

[1m> Finished chain.[0m

[{'text': "\n\nDocumentary about Bigfoot in Paris focuses on the story of a documentary filmmaker and their search for evidence of the legendary Bigfoot creature in the city of Paris. The play follows the filmmaker as they explore the city, meeting people from all walks of life who have had encounters with the mysterious creature. Through their conversations, the filmmaker unravels the story of Bigfoot and finds out the truth about the creature's presence in Paris. As the story progresses, the filmmaker learns more and more about the mysterious creature, as well as the different perspectives of the people living in the city, and what they think of the creature. In the end, the filmmaker's findings lead them to some surprising and heartwarming conclusions about the creature's existence and the importance it holds in the lives of the people in Paris."}]

Argilla UI with LangChain Chain input-response

情境 3：使用帶有工具的代理程式

最後，作為更進階的工作流程，您可以建立使用某些工具的代理程式。這樣 ArgillaCallbackHandler 將追蹤輸入和輸出，但不追蹤中間步驟/想法，以便在給定提示的情況下，我們記錄原始提示和對該給定提示的最終回應。

請注意，對於此情境，我們將使用 Google Search API (Serp API)，因此您需要同時安裝 google-search-results，如 pip install google-search-results，並將 Serp API 金鑰設定為 os.environ["SERPAPI_API_KEY"] = "..." (您可以在 https://serpapi.com/dashboard 找到它)，否則以下範例將無法運作。

from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_core.callbacks.stdout import StdOutCallbackHandler
from langchain_openai import OpenAI

argilla_callback = ArgillaCallbackHandler(
    dataset_name="langchain-dataset",
    api_url=os.environ["ARGILLA_API_URL"],
    api_key=os.environ["ARGILLA_API_KEY"],
)
callbacks = [StdOutCallbackHandler(), argilla_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks)

tools = load_tools(["serpapi"], llm=llm, callbacks=callbacks)
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    callbacks=callbacks,
)
agent.run("Who was the first president of the United States of America?")

API 參考文檔：AgentType | initialize_agent | load_tools | StdOutCallbackHandler | OpenAI

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to answer a historical question
Action: Search
Action Input: "who was the first president of the United States of America" [0m
Observation: [36;1m[1;3mGeorge Washington[0m
Thought:[32;1m[1;3m George Washington was the first president
Final Answer: George Washington was the first president of the United States of America.[0m

[1m> Finished chain.[0m

'George Washington was the first president of the United States of America.'

Argilla UI with LangChain Agent input-response

安裝與設定​

取得 API 憑證​

設定 Argilla​

追蹤​

情境 1：追蹤 LLM​

情境 2：在鏈中追蹤 LLM​

情境 3：使用帶有工具的代理程式​

這個頁面有幫助嗎？