Huggingface 端點

Hugging Face Hub 是一個平台，擁有超過 12 萬個模型、2 萬個資料集和 5 萬個演示應用程式 (Spaces)，全部開源且公開可用，在線上平台上人們可以輕鬆協作並一起建構 ML。

Hugging Face Hub 還提供各種端點來建構 ML 應用程式。此範例展示如何連接到不同類型的端點。

特別是，文字生成推論由 Text Generation Inference 提供支援：一個客製化的 Rust、Python 和 gRPC 伺服器，用於極速文字生成推論。

from langchain_huggingface import HuggingFaceEndpoint

API 參考文件：HuggingFaceEndpoint

安裝與設定

若要使用，您應該已安裝 huggingface_hub python 套件。

%pip install --upgrade --quiet huggingface_hub

# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

準備範例

from langchain_huggingface import HuggingFaceEndpoint

API 參考文件：HuggingFaceEndpoint

from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

API 參考文件：LLMChain | PromptTemplate

question = "Who won the FIFA World Cup in the year 1994? "

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

範例

以下範例說明如何存取免費 Serverless Endpoints API 的 HuggingFaceEndpoint 整合。

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.5,
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
)
llm_chain = prompt | llm
print(llm_chain.invoke({"question": question}))

專用端點

免費的 Serverless API 讓您可以立即實作解決方案並進行迭代，但對於重度使用案例，可能會受到速率限制，因為負載與其他請求共用。

對於企業工作負載，最佳做法是使用 Inference Endpoints - Dedicated。這提供了對完全託管基礎架構的存取權，可提供更高的彈性和速度。這些資源提供持續的支援和正常運行時間保證，以及自動擴展等選項。

# Set the url to your Inference Endpoint below
your_endpoint_url = "https://fayjubiy2xqn36z0.us-east-1.aws.endpoints.huggingface.cloud"

llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
llm("What did foo say about bar?")

串流

from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_huggingface import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
)
llm("What did foo say about bar?", callbacks=[StreamingStdOutCallbackHandler()])

API 參考文件：StreamingStdOutCallbackHandler | HuggingFaceEndpoint

這個相同的 HuggingFaceEndpoint 類別可以用於本地 HuggingFace TGI 實例，為 LLM 提供服務。查看 TGI repository 以了解各種硬體（GPU、TPU、Gaudi...）支援的詳細資訊。

LLM 概念指南
LLM 操作指南

安裝與設定​

準備範例​

範例​

專用端點​

串流​

相關內容​

此頁面是否對您有幫助？

安裝與設定

準備範例

範例

專用端點

串流

相關內容