Llama.cpp

llama.cpp python 函式庫是 @ggerganov llama.cpp 的簡單 Python 繫結。

此套件提供

透過 ctypes 介面低階存取 C API。

用於文字完成的高階 Python API

類似 OpenAI 的 API

LangChain 相容性

LlamaIndex 相容性

OpenAI 相容的網頁伺服器

Local Copilot 替代方案

函式呼叫支援

Vision API 支援

多個模型

概觀

整合詳細資訊

類別	套件	本地	可序列化	JS 支援
ChatLlamaCpp	langchain-community	✅	❌	❌

模型功能

工具呼叫	結構化輸出	JSON 模式	圖像輸入	音訊輸入	視訊輸入	Token 層級串流	原生非同步	Token 使用量	Logprobs
✅	✅	❌	❌	❌	❌	✅	❌	❌	✅

設定

若要開始並使用下方顯示的所有功能，我們建議使用針對工具呼叫進行微調的模型。

我們將使用 NousResearch 的 Hermes-2-Pro-Llama-3-8B-GGUF。

Hermes 2 Pro 是 Nous Hermes 2 的升級版本，包含 OpenHermes 2.5 資料集的更新和清理版本，以及內部開發的全新函式呼叫和 JSON 模式資料集。這個新版本的 Hermes 維持了其出色的通用任務和對話能力，並且在函式呼叫方面也表現出色

請參閱我們的本地模型指南以深入了解

安裝

LangChain LlamaCpp 整合位於 langchain-community 和 llama-cpp-python 套件中

%pip install -qU langchain-community llama-cpp-python

例項化

現在我們可以例項化我們的模型物件並產生聊天完成

# Path to your model weights
local_model = "local/path/to/Hermes-2-Pro-Llama-3-8B-Q8_0.gguf"

import multiprocessing

from langchain_community.chat_models import ChatLlamaCpp

llm = ChatLlamaCpp(
    temperature=0.5,
    model_path=local_model,
    n_ctx=10000,
    n_gpu_layers=8,
    n_batch=300,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    max_tokens=512,
    n_threads=multiprocessing.cpu_count() - 1,
    repeat_penalty=1.5,
    top_p=0.5,
    verbose=True,
)

API 參考：ChatLlamaCpp

調用

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]

ai_msg = llm.invoke(messages)
ai_msg

print(ai_msg.content)

J'aime programmer. (In France, "programming" is often used in its original sense of scheduling or organizing events.) 

If you meant computer-programming: 
Je suis amoureux de la programmation informatique.

(You might also say simply 'programmation', which would be understood as both meanings - depending on context).

串鏈

我們可以像這樣使用提示範本串鏈我們的模型

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "input_language": "English",
        "output_language": "German",
        "input": "I love programming.",
    }
)

API 參考：ChatPromptTemplate

工具呼叫

首先，它的運作方式與 OpenAI 函式呼叫大致相同

OpenAI 有一個工具呼叫（我們在這裡交替使用「工具呼叫」和「函式呼叫」）API，可讓您描述工具及其引數，並讓模型傳回 JSON 物件，其中包含要調用的工具和該工具的輸入。「工具呼叫」對於建置工具使用鏈和代理程式，以及更廣泛地從模型取得結構化輸出非常有用。

透過 ChatLlamaCpp.bind_tools，我們可以輕鬆地將 Pydantic 類別、字典結構描述、LangChain 工具，甚至函式作為工具傳遞給模型。在底層，這些會轉換為 OpenAI 工具結構描述，如下所示

{
    "name": "...",
    "description": "...",
    "parameters": {...}  # JSONSchema
}

並在每次模型調用中傳遞。

但是，它無法自動觸發函式/工具，我們需要透過指定 'tool choice' 參數來強制執行。此參數的格式通常如下所述。

{"type": "function", "function": {"name": <<tool_name>>}}。

from langchain_core.tools import tool
from pydantic import BaseModel, Field


class WeatherInput(BaseModel):
    location: str = Field(description="The city and state, e.g. San Francisco, CA")
    unit: str = Field(enum=["celsius", "fahrenheit"])


@tool("get_current_weather", args_schema=WeatherInput)
def get_weather(location: str, unit: str):
    """Get the current weather in a given location"""
    return f"Now the weather in {location} is 22 {unit}"


llm_with_tools = llm.bind_tools(
    tools=[get_weather],
    tool_choice={"type": "function", "function": {"name": "get_current_weather"}},
)

API 參考：tool

ai_msg = llm_with_tools.invoke(
    "what is the weather like in HCMC in celsius",
)

ai_msg.tool_calls

[{'name': 'get_current_weather',
  'args': {'location': 'Ho Chi Minh City', 'unit': 'celsius'},
  'id': 'call__0_get_current_weather_cmpl-394d9943-0a1f-425b-8139-d2826c1431f2'}]

class MagicFunctionInput(BaseModel):
    magic_function_input: int = Field(description="The input value for magic function")


@tool("get_magic_function", args_schema=MagicFunctionInput)
def magic_function(magic_function_input: int):
    """Get the value of magic function for an input."""
    return magic_function_input + 2


llm_with_tools = llm.bind_tools(
    tools=[magic_function],
    tool_choice={"type": "function", "function": {"name": "get_magic_function"}},
)

ai_msg = llm_with_tools.invoke(
    "What is magic function of 3?",
)

ai_msg

ai_msg.tool_calls

[{'name': 'get_magic_function',
  'args': {'magic_function_input': 3},
  'id': 'call__0_get_magic_function_cmpl-cd83a994-b820-4428-957c-48076c68335a'}]

結構化輸出

from langchain_core.utils.function_calling import convert_to_openai_tool
from pydantic import BaseModel


class Joke(BaseModel):
    """A setup to a joke and the punchline."""

    setup: str
    punchline: str


dict_schema = convert_to_openai_tool(Joke)
structured_llm = llm.with_structured_output(dict_schema)
result = structured_llm.invoke("Tell me a joke about birds")
result

API 參考：convert_to_openai_tool

result

{'setup': '- Why did the chicken cross the playground?',
 'punchline': '\n\n- To get to its gilded cage on the other side!'}

串流

for chunk in llm.stream("what is 25x5"):
    print(chunk.content, end="\n", flush=True)

API 參考

如需所有 ChatLlamaCpp 功能和配置的詳細文件，請前往 API 參考：https://langchain-python.dev.org.tw/api_reference/community/chat_models/langchain_community.chat_models.llamacpp.ChatLlamaCpp.html

聊天模型概念指南
聊天模型操作指南

概觀​

整合詳細資訊​

模型功能​

設定​

安裝​

例項化​

調用​

串鏈​

工具呼叫​

結構化輸出

串流

API 參考​

相關內容​

此頁面是否對您有幫助？

概觀

整合詳細資訊

模型功能

設定

安裝

例項化

調用

串鏈

工具呼叫

API 參考

相關內容