跳到主要內容

ChatDatabricks

Databricks Lakehouse Platform 將資料、分析和 AI 整合在單一平台上。

本筆記本快速概述如何開始使用 Databricks 聊天模型。如需所有 ChatDatabricks 功能和組態的詳細文件,請前往 API 參考文件

概觀

ChatDatabricks 類別封裝了託管在 Databricks Model Serving 上的聊天模型端點。此範例筆記本示範如何封裝您的服務端點,並在您的 LangChain 應用程式中將其用作聊天模型。

整合詳細資訊

類別套件本地可序列化套件下載套件最新版
ChatDatabricksdatabricks-langchainbetaPyPI - DownloadsPyPI - Version

模型功能

工具呼叫結構化輸出JSON 模式影像輸入音訊輸入視訊輸入Token 層級串流原生非同步Token 使用量Logprobs

支援方法

ChatDatabricks 支援 ChatModel 的所有方法,包括非同步 API。

端點需求

ChatDatabricks 封裝的服務端點必須具有與 OpenAI 相容的聊天輸入/輸出格式 (參考資料)。只要輸入格式相容,ChatDatabricks 即可用於託管在 Databricks Model Serving 上的任何端點類型

  1. 基礎模型 - 精選的最先進基礎模型列表,例如 DRBX、Llama3、Mixtral-8x7B 等。這些端點可在您的 Databricks 工作區中直接使用,無需任何設定。
  2. 自訂模型 - 您也可以透過 MLflow 將自訂模型部署到服務端點,並選擇您偏好的架構,例如 LangChain、Pytorch、Transformers 等。
  3. 外部模型 - Databricks 端點可以作為代理伺服器,為託管在 Databricks 外部的模型提供服務,例如 OpenAI GPT4 等專有模型服務。

設定

若要存取 Databricks 模型,您需要建立 Databricks 帳戶、設定憑證(僅當您在 Databricks 工作區外部時)並安裝必要的套件。

憑證(僅當您在 Databricks 外部時)

如果您在 Databricks 內部執行 LangChain 應用程式,則可以跳過此步驟。

否則,您需要手動將 Databricks 工作區主機名稱和個人存取權杖分別設定為 DATABRICKS_HOSTDATABRICKS_TOKEN 環境變數。請參閱 驗證文件,以瞭解如何取得存取權杖。

import getpass
import os

os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
if "DATABRICKS_TOKEN" not in os.environ:
os.environ["DATABRICKS_TOKEN"] = getpass.getpass(
"Enter your Databricks access token: "
)
Enter your Databricks access token:  ········

安裝

LangChain Databricks 整合位於 databricks-langchain 套件中。

%pip install -qU databricks-langchain

我們首先示範如何使用 ChatDatabricks 查詢託管為基礎模型端點的 DBRX-instruct 模型。

對於其他類型的端點,設定端點本身的方式略有不同,但是,一旦端點準備就緒,使用 ChatDatabricks 查詢端點的方式就沒有任何差異。有關其他類型端點的範例,請參閱本筆記本的底部。

例項化

from databricks_langchain import ChatDatabricks

chat_model = ChatDatabricks(
endpoint="databricks-dbrx-instruct",
temperature=0.1,
max_tokens=256,
# See https://langchain-python.dev.org.tw/api_reference/community/chat_models/langchain_community.chat_models.databricks.ChatDatabricks.html for other supported parameters
)

調用

chat_model.invoke("What is MLflow?")
AIMessage(content='MLflow is an open-source platform for managing end-to-end machine learning workflows. It was introduced by Databricks in 2018. MLflow provides tools for tracking experiments, packaging and sharing code, and deploying models. It is designed to work with any machine learning library and can be used in a variety of environments, including local machines, virtual machines, and cloud-based clusters. MLflow aims to streamline the machine learning development lifecycle, making it easier for data scientists and engineers to collaborate and deploy models into production.', response_metadata={'prompt_tokens': 229, 'completion_tokens': 104, 'total_tokens': 333}, id='run-d3fb4d06-3e10-4471-83c9-c282cc62b74d-0')
# You can also pass a list of messages
messages = [
("system", "You are a chatbot that can answer questions about Databricks."),
("user", "What is Databricks Model Serving?"),
]
chat_model.invoke(messages)
AIMessage(content='Databricks Model Serving is a feature of the Databricks platform that allows data scientists and engineers to easily deploy machine learning models into production. With Model Serving, you can host, manage, and serve machine learning models as APIs, making it easy to integrate them into applications and business processes. It supports a variety of popular machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn, and provides tools for monitoring and managing the performance of deployed models. Model Serving is designed to be scalable, secure, and easy to use, making it a great choice for organizations that want to quickly and efficiently deploy machine learning models into production.', response_metadata={'prompt_tokens': 35, 'completion_tokens': 130, 'total_tokens': 165}, id='run-b3feea21-223e-4105-8627-41d647d5ccab-0')

鏈結

與其他聊天模型類似,ChatDatabricks 可以用作複雜鏈結的一部分。

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a chatbot that can answer questions about {topic}.",
),
("user", "{question}"),
]
)

chain = prompt | chat_model
chain.invoke(
{
"topic": "Databricks",
"question": "What is Unity Catalog?",
}
)
API 參考:ChatPromptTemplate
AIMessage(content="Unity Catalog is a new data catalog feature in Databricks that allows you to discover, manage, and govern all your data assets across your data landscape, including data lakes, data warehouses, and data marts. It provides a centralized repository for storing and managing metadata, data lineage, and access controls for all your data assets. Unity Catalog enables data teams to easily discover and access the data they need, while ensuring compliance with data privacy and security regulations. It is designed to work seamlessly with Databricks' Lakehouse platform, providing a unified experience for managing and analyzing all your data.", response_metadata={'prompt_tokens': 32, 'completion_tokens': 118, 'total_tokens': 150}, id='run-82d72624-f8df-4c0d-a976-919feec09a55-0')

調用(串流)

for chunk in chat_model.stream("How are you?"):
print(chunk.content, end="|")
I|'m| an| AI| and| don|'t| have| feelings|,| but| I|'m| here| and| ready| to| assist| you|.| How| can| I| help| you| today|?||

非同步調用

import asyncio

country = ["Japan", "Italy", "Australia"]
futures = [chat_model.ainvoke(f"Where is the capital of {c}?") for c in country]
await asyncio.gather(*futures)

工具呼叫

ChatDatabricks 支援與 OpenAI 相容的工具呼叫 API,可讓您描述工具及其引數,並讓模型傳回一個 JSON 物件,其中包含要調用的工具和該工具的輸入。工具呼叫對於建構使用工具的鏈結和代理程式,以及更廣泛地從模型取得結構化輸出非常有用。

透過 ChatDatabricks.bind_tools,我們可以輕鬆地將 Pydantic 類別、字典結構描述、LangChain 工具,甚至函數作為工具傳遞給模型。在底層,這些會轉換為與 OpenAI 相容的工具結構描述,如下所示

{
"name": "...",
"description": "...",
"parameters": {...} # JSONSchema
}

並在每次模型調用中傳遞。

from pydantic import BaseModel, Field


class GetWeather(BaseModel):
"""Get the current weather in a given location"""

location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


class GetPopulation(BaseModel):
"""Get the current population in a given location"""

location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


llm_with_tools = chat_model.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
print(ai_msg.tool_calls)

封裝自訂模型端點

先決條件

一旦端點準備就緒,使用模式就與基礎模型的使用模式相同。

chat_model_custom = ChatDatabricks(
endpoint="YOUR_ENDPOINT_NAME",
temperature=0.1,
max_tokens=256,
)

chat_model_custom.invoke("How are you?")

封裝外部模型

先決條件:建立 Proxy 端點

首先,建立一個新的 Databricks 服務端點,將請求代理到目標外部模型。對於代理外部模型,端點建立應該非常快速。

這需要在 Databricks 秘密管理員中註冊您的 OpenAI API 金鑰,如下所示

# Replace `<scope>` with your scope
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY

有關如何設定 Databricks CLI 和管理秘密,請參閱 https://docs.databricks.com/en/security/secrets/secrets.html

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")

secret = "secrets/<scope>/openai-api-key" # replace `<scope>` with your scope
endpoint_name = "my-chat" # rename this if my-chat already exists
client.create_endpoint(
name=endpoint_name,
config={
"served_entities": [
{
"name": "my-chat",
"external_model": {
"name": "gpt-3.5-turbo",
"provider": "openai",
"task": "llm/v1/chat",
"openai_config": {
"openai_api_key": "{{" + secret + "}}",
},
},
}
],
},
)

一旦端點狀態變為「就緒」,您就可以像查詢其他類型的端點一樣查詢該端點。

chat_model_external = ChatDatabricks(
endpoint=endpoint_name,
temperature=0.1,
max_tokens=256,
)
chat_model_external.invoke("How to use Databricks?")

Databricks 上的函數呼叫

Databricks 函數呼叫與 OpenAI 相容,並且僅在模型服務期間作為基礎模型 API 的一部分提供。

有關支援的模型,請參閱 Databricks 函數呼叫簡介

llm = ChatDatabricks(endpoint="databricks-meta-llama-3-70b-instruct")
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
},
},
}
]

# supported tool_choice values: "auto", "required", "none", function name in string format,
# or a dictionary as {"type": "function", "function": {"name": <<tool_name>>}}
model = llm.bind_tools(tools, tool_choice="auto")

messages = [{"role": "user", "content": "What is the current temperature of Chicago?"}]
print(model.invoke(messages))

有關如何在鏈結中使用 UC 函數,請參閱 Databricks Unity Catalog

API 參考

如需所有 ChatDatabricks 功能和組態的詳細文件,請前往 API 參考:https://langchain-python.dev.org.tw/api_reference/databricks/chat_models/langchain_databricks.chat_models.ChatDatabricks.html


此頁面是否有幫助?