ChatDatabricks

Databricks Lakehouse Platform 在單一平台上整合數據、分析和 AI。

此筆記本快速概述 Databricks 聊天模型的入門知識。如需所有 ChatDatabricks 功能和組態的詳細文件，請前往 API 參考。

概觀

ChatDatabricks 類別封裝了託管在 Databricks Model Serving 上的聊天模型端點。此範例筆記本示範如何封裝您的服務端點，並在您的 LangChain 應用程式中將其用作聊天模型。

整合詳細資訊

類別	套件	本地	可序列化	套件下載	套件最新版
ChatDatabricks	databricks-langchain	❌	beta

模型功能

工具呼叫	結構化輸出	JSON 模式	圖像輸入	音訊輸入	視訊輸入	Token 級別串流	原生非同步	Token 使用量	Logprobs
✅	✅	✅	❌	❌	❌	✅	✅	✅	❌

支援的方法

ChatDatabricks 支援 ChatModel 的所有方法，包括非同步 API。

端點需求

ChatDatabricks 封裝的服務端點必須具有 OpenAI 相容的聊天輸入/輸出格式 (參考)。只要輸入格式相容，ChatDatabricks 即可用於託管在 Databricks Model Serving 上的任何端點類型

基礎模型 - 精選最先進的基礎模型列表，例如 DRBX、Llama3、Mixtral-8x7B 等。這些端點已準備好在您的 Databricks 工作區中使用，無需任何設定。
自訂模型 - 您也可以透過 MLflow 將自訂模型部署到服務端點，並選擇您喜歡的架構，例如 LangChain、Pytorch、Transformers 等。
外部模型 - Databricks 端點可以作為代理伺服器，為託管在 Databricks 外部的模型提供服務，例如 OpenAI GPT4 等專有模型服務。

設定

若要存取 Databricks 模型，您需要建立 Databricks 帳戶、設定憑證（僅當您在 Databricks 工作區外部時），並安裝必要的套件。

憑證（僅當您在 Databricks 外部時）

如果您在 Databricks 內部執行 LangChain 應用程式，則可以略過此步驟。

否則，您需要手動將 Databricks 工作區主機名稱和個人存取權杖分別設定為 DATABRICKS_HOST 和 DATABRICKS_TOKEN 環境變數。請參閱驗證文件，瞭解如何取得存取權杖。

import getpass
import os

os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
if "DATABRICKS_TOKEN" not in os.environ:
    os.environ["DATABRICKS_TOKEN"] = getpass.getpass(
        "Enter your Databricks access token: "
    )

Enter your Databricks access token:  ········

安裝

LangChain Databricks 整合位於 databricks-langchain 套件中。

%pip install -qU databricks-langchain

我們首先示範如何使用 ChatDatabricks 查詢託管為基礎模型端點的 DBRX-instruct 模型。

對於其他類型的端點，在如何設定端點本身方面存在一些差異，但是，一旦端點準備就緒，使用 ChatDatabricks 查詢它的方式就沒有任何差異。有關其他類型端點的範例，請參閱此筆記本的底部。

實例化

from databricks_langchain import ChatDatabricks

chat_model = ChatDatabricks(
    endpoint="databricks-dbrx-instruct",
    temperature=0.1,
    max_tokens=256,
    # See https://langchain-python.dev.org.tw/api_reference/community/chat_models/langchain_community.chat_models.databricks.ChatDatabricks.html for other supported parameters
)

調用

chat_model.invoke("What is MLflow?")

AIMessage(content='MLflow is an open-source platform for managing end-to-end machine learning workflows. It was introduced by Databricks in 2018. MLflow provides tools for tracking experiments, packaging and sharing code, and deploying models. It is designed to work with any machine learning library and can be used in a variety of environments, including local machines, virtual machines, and cloud-based clusters. MLflow aims to streamline the machine learning development lifecycle, making it easier for data scientists and engineers to collaborate and deploy models into production.', response_metadata={'prompt_tokens': 229, 'completion_tokens': 104, 'total_tokens': 333}, id='run-d3fb4d06-3e10-4471-83c9-c282cc62b74d-0')

# You can also pass a list of messages
messages = [
    ("system", "You are a chatbot that can answer questions about Databricks."),
    ("user", "What is Databricks Model Serving?"),
]
chat_model.invoke(messages)

AIMessage(content='Databricks Model Serving is a feature of the Databricks platform that allows data scientists and engineers to easily deploy machine learning models into production. With Model Serving, you can host, manage, and serve machine learning models as APIs, making it easy to integrate them into applications and business processes. It supports a variety of popular machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn, and provides tools for monitoring and managing the performance of deployed models. Model Serving is designed to be scalable, secure, and easy to use, making it a great choice for organizations that want to quickly and efficiently deploy machine learning models into production.', response_metadata={'prompt_tokens': 35, 'completion_tokens': 130, 'total_tokens': 165}, id='run-b3feea21-223e-4105-8627-41d647d5ccab-0')

鏈接

與其他聊天模型類似，ChatDatabricks 可以用作複雜鏈的一部分。

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a chatbot that can answer questions about {topic}.",
        ),
        ("user", "{question}"),
    ]
)

chain = prompt | chat_model
chain.invoke(
    {
        "topic": "Databricks",
        "question": "What is Unity Catalog?",
    }
)

API 參考：ChatPromptTemplate

AIMessage(content="Unity Catalog is a new data catalog feature in Databricks that allows you to discover, manage, and govern all your data assets across your data landscape, including data lakes, data warehouses, and data marts. It provides a centralized repository for storing and managing metadata, data lineage, and access controls for all your data assets. Unity Catalog enables data teams to easily discover and access the data they need, while ensuring compliance with data privacy and security regulations. It is designed to work seamlessly with Databricks' Lakehouse platform, providing a unified experience for managing and analyzing all your data.", response_metadata={'prompt_tokens': 32, 'completion_tokens': 118, 'total_tokens': 150}, id='run-82d72624-f8df-4c0d-a976-919feec09a55-0')

調用（串流）

for chunk in chat_model.stream("How are you?"):
    print(chunk.content, end="|")

I|'m| an| AI| and| don|'t| have| feelings|,| but| I|'m| here| and| ready| to| assist| you|.| How| can| I| help| you| today|?||

非同步調用

import asyncio

country = ["Japan", "Italy", "Australia"]
futures = [chat_model.ainvoke(f"Where is the capital of {c}?") for c in country]
await asyncio.gather(*futures)

工具呼叫

ChatDatabricks 支援 OpenAI 相容的工具呼叫 API，可讓您描述工具及其引數，並讓模型傳回一個 JSON 物件，其中包含要調用的工具和該工具的輸入。工具呼叫對於建構工具使用鏈和代理程式，以及更普遍地從模型取得結構化輸出非常有用。

透過 ChatDatabricks.bind_tools，我們可以輕鬆地將 Pydantic 類別、字典架構、LangChain 工具，甚至函數作為工具傳遞到模型中。在底層，這些會轉換為 OpenAI 相容的工具架構，如下所示

{
    "name": "...",
    "description": "...",
    "parameters": {...}  # JSONSchema
}

並在每次模型調用中傳遞。

from pydantic import BaseModel, Field


class GetWeather(BaseModel):
    """Get the current weather in a given location"""

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


class GetPopulation(BaseModel):
    """Get the current population in a given location"""

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


llm_with_tools = chat_model.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
    "Which city is hotter today and which is bigger: LA or NY?"
)
print(ai_msg.tool_calls)

封裝自訂模型端點

先決條件

LLM 已註冊並部署到 Databricks 服務端點透過 MLflow。端點必須具有 OpenAI 相容的聊天輸入/輸出格式 (參考)
您擁有 "可以查詢" 權限存取端點。

一旦端點準備就緒，使用模式與基礎模型的使用模式相同。

chat_model_custom = ChatDatabricks(
    endpoint="YOUR_ENDPOINT_NAME",
    temperature=0.1,
    max_tokens=256,
)

chat_model_custom.invoke("How are you?")

封裝外部模型

先決條件：建立代理端點

首先，建立一個新的 Databricks 服務端點，將請求代理到目標外部模型。對於代理外部模型，端點建立應該非常快速。

這需要在 Databricks 秘密管理員中註冊您的 OpenAI API 金鑰，如下所示

# Replace `<scope>` with your scope
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY

有關如何設定 Databricks CLI 和管理秘密，請參閱 https://docs.databricks.com/en/security/secrets/secrets.html

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")

secret = "secrets/<scope>/openai-api-key"  # replace `<scope>` with your scope
endpoint_name = "my-chat"  # rename this if my-chat already exists
client.create_endpoint(
    name=endpoint_name,
    config={
        "served_entities": [
            {
                "name": "my-chat",
                "external_model": {
                    "name": "gpt-3.5-turbo",
                    "provider": "openai",
                    "task": "llm/v1/chat",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)

一旦端點狀態變為「就緒」，您就可以像查詢其他類型的端點一樣查詢該端點。

chat_model_external = ChatDatabricks(
    endpoint=endpoint_name,
    temperature=0.1,
    max_tokens=256,
)
chat_model_external.invoke("How to use Databricks?")

Databricks 上的函數呼叫

Databricks 函數呼叫與 OpenAI 相容，並且僅在模型服務期間作為基礎模型 API 的一部分提供。

有關支援的模型，請參閱 Databricks 函數呼叫簡介。

llm = ChatDatabricks(endpoint="databricks-meta-llama-3-70b-instruct")
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
            },
        },
    }
]

# supported tool_choice values: "auto", "required", "none", function name in string format,
# or a dictionary as {"type": "function", "function": {"name": <<tool_name>>}}
model = llm.bind_tools(tools, tool_choice="auto")

messages = [{"role": "user", "content": "What is the current temperature of Chicago?"}]
print(model.invoke(messages))

有關如何在鏈中使用 UC 函數，請參閱 Databricks Unity Catalog。

API 參考

如需所有 ChatDatabricks 功能和組態的詳細文件，請前往 API 參考：https://api-docs.databricks.com/python/databricks-ai-bridge/latest/databricks_langchain.html#databricks_langchain.ChatDatabricks

聊天模型概念指南
聊天模型操作指南

概觀​

整合詳細資訊​

模型功能​

支援的方法​

端點需求​

設定​

憑證（僅當您在 Databricks 外部時）​

安裝​

實例化​

調用​

鏈接​

調用（串流）​

非同步調用​

工具呼叫​

封裝自訂模型端點​

封裝外部模型​

Databricks 上的函數呼叫​

API 參考​

相關內容​

此頁面是否對您有幫助？

概觀