SQLDatabase 工具包
這將幫助您開始使用 SQL 資料庫 工具包。 有關所有 SQLDatabaseToolkit
功能和配置的詳細文檔,請前往API 參考。
SQLDatabaseToolkit
中的工具旨在與 SQL
資料庫互動。
一個常見的應用是使代理能夠使用關係資料庫中的資料回答問題,可能以迭代方式進行(例如,從錯誤中恢復)。
⚠️ 安全注意事項 ⚠️
建立 SQL 資料庫的問答系統需要執行模型產生的 SQL 查詢。 這樣做存在固有風險。 請確保您的資料庫連接權限始終盡可能縮小範圍,以滿足您的鏈/代理的需求。 這將減輕(但不能消除)建構模型驅動系統的風險。 有關通用安全最佳實務的更多訊息,請參閱此處。
設定 (Setup)
如果您想從個別工具的運行中獲得自動追蹤,您也可以透過取消註釋下方內容來設定您的 LangSmith API 密鑰
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
安裝 (Installation)
此工具包位於 langchain-community
包中
%pip install --upgrade --quiet langchain-community
出於示範目的,我們將存取 LangChain Hub 中的提示。 我們還需要 langgraph
來示範如何將工具包與代理一起使用。 使用此工具包不是必需的。
%pip install --upgrade --quiet langchainhub langgraph
實例化 (Instantiation)
SQLDatabaseToolkit
工具包需要
- 一個 SQLDatabase 物件;
- 一個 LLM 或聊天模型(用於實例化 QuerySQLCheckerTool 工具)。
下面,我們使用這些物件實例化工具包。 讓我們首先建立一個資料庫物件。
本指南使用基於 這些說明 的 Chinook
範例資料庫。
下面我們將使用 requests
庫來提取 .sql
檔案並建立記憶體中的 SQLite 資料庫。 請注意,這種方法很輕量,但短暫且非執行緒安全。 如果您願意,您可以按照說明將檔案儲存在本機為 Chinook.db
,並透過 db = SQLDatabase.from_uri("sqlite:///Chinook.db")
實例化資料庫。
import sqlite3
import requests
from langchain_community.utilities.sql_database import SQLDatabase
from sqlalchemy import create_engine
from sqlalchemy.pool import StaticPool
def get_engine_for_chinook_db():
"""Pull sql file, populate in-memory database, and create engine."""
url = "https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql"
response = requests.get(url)
sql_script = response.text
connection = sqlite3.connect(":memory:", check_same_thread=False)
connection.executescript(sql_script)
return create_engine(
"sqlite://",
creator=lambda: connection,
poolclass=StaticPool,
connect_args={"check_same_thread": False},
)
engine = get_engine_for_chinook_db()
db = SQLDatabase(engine)
我們還需要 LLM 或聊天模型
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
我們現在可以實例化工具包
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
工具 (Tools)
檢視可用工具
toolkit.get_tools()
[QuerySQLDatabaseTool(description="Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.", db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x103d5fa60>),
InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x103d5fa60>),
ListSQLDatabaseTool(db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x103d5fa60>),
QuerySQLCheckerTool(description='Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x103d5fa60>, llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x10742d720>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x10742f7f0>, root_client=<openai.OpenAI object at 0x103d5fac0>, root_async_client=<openai.AsyncOpenAI object at 0x10742d780>, temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********')), llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['dialect', 'query'], input_types={}, partial_variables={}, template='\n{query}\nDouble check the {dialect} query above for common mistakes, including:\n- Using NOT IN with NULL values\n- Using UNION when UNION ALL should have been used\n- Using BETWEEN for exclusive ranges\n- Data type mismatch in predicates\n- Properly quoting identifiers\n- Using the correct number of arguments for functions\n- Casting to the correct data type\n- Using the proper columns for joins\n\nIf there are any of the above mistakes, rewrite the query. If there are no mistakes, just reproduce the original query.\n\nOutput the final SQL query only.\n\nSQL Query: '), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x10742d720>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x10742f7f0>, root_client=<openai.OpenAI object at 0x103d5fac0>, root_async_client=<openai.AsyncOpenAI object at 0x10742d780>, temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********')), output_parser=StrOutputParser(), llm_kwargs={}))]
您可以直接使用個別工具
from langchain_community.tools.sql_database.tool import (
InfoSQLDatabaseTool,
ListSQLDatabaseTool,
QuerySQLCheckerTool,
QuerySQLDatabaseTool,
)
在代理中使用 (Use within an agent)
依照SQL 問答教學,下面我們為一個簡單的問答代理配備了我們工具包中的工具。 首先,我們提取相關提示並使用其所需的參數填充它
from langchain import hub
prompt_template = hub.pull("langchain-ai/sql-agent-system-prompt")
assert len(prompt_template.messages) == 1
print(prompt_template.input_variables)
['dialect', 'top_k']
system_message = prompt_template.format(dialect="SQLite", top_k=5)
然後我們實例化代理
from langgraph.prebuilt import create_react_agent
agent_executor = create_react_agent(
llm, toolkit.get_tools(), state_modifier=system_message
)
並對其發出查詢
example_query = "Which country's customers spent the most?"
events = agent_executor.stream(
{"messages": [("user", example_query)]},
stream_mode="values",
)
for event in events:
event["messages"][-1].pretty_print()
================================[1m Human Message [0m=================================
Which country's customers spent the most?
==================================[1m Ai Message [0m==================================
Tool Calls:
sql_db_list_tables (call_EBPjyfzqXzFutDn8BklYACLj)
Call ID: call_EBPjyfzqXzFutDn8BklYACLj
Args:
=================================[1m Tool Message [0m=================================
Name: sql_db_list_tables
Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
==================================[1m Ai Message [0m==================================
Tool Calls:
sql_db_schema (call_kGcnKpxRVFIY8dPjYIJbRoVU)
Call ID: call_kGcnKpxRVFIY8dPjYIJbRoVU
Args:
table_names: Customer, Invoice, InvoiceLine
=================================[1m Tool Message [0m=================================
Name: sql_db_schema
CREATE TABLE "Customer" (
"CustomerId" INTEGER NOT NULL,
"FirstName" NVARCHAR(40) NOT NULL,
"LastName" NVARCHAR(20) NOT NULL,
"Company" NVARCHAR(80),
"Address" NVARCHAR(70),
"City" NVARCHAR(40),
"State" NVARCHAR(40),
"Country" NVARCHAR(40),
"PostalCode" NVARCHAR(10),
"Phone" NVARCHAR(24),
"Fax" NVARCHAR(24),
"Email" NVARCHAR(60) NOT NULL,
"SupportRepId" INTEGER,
PRIMARY KEY ("CustomerId"),
FOREIGN KEY("SupportRepId") REFERENCES "Employee" ("EmployeeId")
)
/*
3 rows from Customer table:
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
1 Luís Gonçalves Embraer - Empresa Brasileira de Aeronáutica S.A. Av. Brigadeiro Faria Lima, 2170 São José dos Campos SP Brazil 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 luisg@embraer.com.br 3
2 Leonie Köhler None Theodor-Heuss-Straße 34 Stuttgart None Germany 70174 +49 0711 2842222 None leonekohler@surfeu.de 5
3 François Tremblay None 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 None ftremblay@gmail.com 3
*/
CREATE TABLE "Invoice" (
"InvoiceId" INTEGER NOT NULL,
"CustomerId" INTEGER NOT NULL,
"InvoiceDate" DATETIME NOT NULL,
"BillingAddress" NVARCHAR(70),
"BillingCity" NVARCHAR(40),
"BillingState" NVARCHAR(40),
"BillingCountry" NVARCHAR(40),
"BillingPostalCode" NVARCHAR(10),
"Total" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("InvoiceId"),
FOREIGN KEY("CustomerId") REFERENCES "Customer" ("CustomerId")
)
/*
3 rows from Invoice table:
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
1 2 2021-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart None Germany 70174 1.98
2 4 2021-01-02 00:00:00 Ullevålsveien 14 Oslo None Norway 0171 3.96
3 8 2021-01-03 00:00:00 Grétrystraat 63 Brussels None Belgium 1000 5.94
*/
CREATE TABLE "InvoiceLine" (
"InvoiceLineId" INTEGER NOT NULL,
"InvoiceId" INTEGER NOT NULL,
"TrackId" INTEGER NOT NULL,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
"Quantity" INTEGER NOT NULL,
PRIMARY KEY ("InvoiceLineId"),
FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"),
FOREIGN KEY("InvoiceId") REFERENCES "Invoice" ("InvoiceId")
)
/*
3 rows from InvoiceLine table:
InvoiceLineId InvoiceId TrackId UnitPrice Quantity
1 1 2 0.99 1
2 1 4 0.99 1
3 2 6 0.99 1
*/
==================================[1m Ai Message [0m==================================
Tool Calls:
sql_db_query (call_cTfI7OrY64FzJaDd49ILFWw7)
Call ID: call_cTfI7OrY64FzJaDd49ILFWw7
Args:
query: SELECT c.Country, SUM(i.Total) AS TotalSpent FROM Customer c JOIN Invoice i ON c.CustomerId = i.CustomerId GROUP BY c.Country ORDER BY TotalSpent DESC LIMIT 1
=================================[1m Tool Message [0m=================================
Name: sql_db_query
[('USA', 523.06)]
==================================[1m Ai Message [0m==================================
Customers from the USA spent the most, with a total amount spent of $523.06.
我們也可以觀察到代理程式從錯誤中恢復
example_query = "Who are the top 3 best selling artists?"
events = agent_executor.stream(
{"messages": [("user", example_query)]},
stream_mode="values",
)
for event in events:
event["messages"][-1].pretty_print()
================================[1m Human Message [0m=================================
Who are the top 3 best selling artists?
==================================[1m Ai Message [0m==================================
Tool Calls:
sql_db_query (call_xAkvYiRFM7nCMKXsDNvk1OMx)
Call ID: call_xAkvYiRFM7nCMKXsDNvk1OMx
Args:
query: SELECT artist_name, SUM(quantity) AS total_sold FROM sales GROUP BY artist_name ORDER BY total_sold DESC LIMIT 3
=================================[1m Tool Message [0m=================================
Name: sql_db_query
Error: (sqlite3.OperationalError) no such table: sales
[SQL: SELECT artist_name, SUM(quantity) AS total_sold FROM sales GROUP BY artist_name ORDER BY total_sold DESC LIMIT 3]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
==================================[1m Ai Message [0m==================================
Tool Calls:
sql_db_list_tables (call_K4Zvbowsq7XPgGFepbvc5G7i)
Call ID: call_K4Zvbowsq7XPgGFepbvc5G7i
Args:
=================================[1m Tool Message [0m=================================
Name: sql_db_list_tables
Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
==================================[1m Ai Message [0m==================================
Tool Calls:
sql_db_schema (call_tUztueSK7VO2klZ99xT4ZVhM)
Call ID: call_tUztueSK7VO2klZ99xT4ZVhM
Args:
table_names: Artist, Album, InvoiceLine
=================================[1m Tool Message [0m=================================
Name: sql_db_schema
CREATE TABLE "Album" (
"AlbumId" INTEGER NOT NULL,
"Title" NVARCHAR(160) NOT NULL,
"ArtistId" INTEGER NOT NULL,
PRIMARY KEY ("AlbumId"),
FOREIGN KEY("ArtistId") REFERENCES "Artist" ("ArtistId")
)
/*
3 rows from Album table:
AlbumId Title ArtistId
1 For Those About To Rock We Salute You 1
2 Balls to the Wall 2
3 Restless and Wild 2
*/
CREATE TABLE "Artist" (
"ArtistId" INTEGER NOT NULL,
"Name" NVARCHAR(120),
PRIMARY KEY ("ArtistId")
)
/*
3 rows from Artist table:
ArtistId Name
1 AC/DC
2 Accept
3 Aerosmith
*/
CREATE TABLE "InvoiceLine" (
"InvoiceLineId" INTEGER NOT NULL,
"InvoiceId" INTEGER NOT NULL,
"TrackId" INTEGER NOT NULL,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
"Quantity" INTEGER NOT NULL,
PRIMARY KEY ("InvoiceLineId"),
FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"),
FOREIGN KEY("InvoiceId") REFERENCES "Invoice" ("InvoiceId")
)
/*
3 rows from InvoiceLine table:
InvoiceLineId InvoiceId TrackId UnitPrice Quantity
1 1 2 0.99 1
2 1 4 0.99 1
3 2 6 0.99 1
*/
==================================[1m Ai Message [0m==================================
Tool Calls:
sql_db_query (call_tVtLQIRPmCM6pukgpHFfq86A)
Call ID: call_tVtLQIRPmCM6pukgpHFfq86A
Args:
query: SELECT Artist.Name AS artist_name, SUM(InvoiceLine.Quantity) AS total_sold FROM Artist JOIN Album ON Artist.ArtistId = Album.ArtistId JOIN Track ON Album.AlbumId = Track.AlbumId JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY total_sold DESC LIMIT 3
=================================[1m Tool Message [0m=================================
Name: sql_db_query
[('Iron Maiden', 140), ('U2', 107), ('Metallica', 91)]
==================================[1m Ai Message [0m==================================
The top 3 best selling artists are:
1. Iron Maiden - 140 units sold
2. U2 - 107 units sold
3. Metallica - 91 units sold
特定功能
SQLDatabaseToolkit
實作了一個 .get_context 方法,方便在提示或其他情境中使用。
⚠️ 免責聲明 ⚠️ : 代理程式可能會產生 insert/update/delete 查詢。如果這不是預期的行為,請使用自訂提示或建立沒有寫入權限的 SQL 使用者。
最終使用者可能會因為詢問一個簡單的問題,例如 "執行最大的查詢",而導致您的 SQL 資料庫過載。產生的查詢可能如下所示
SELECT * FROM "public"."users"
JOIN "public"."user_permissions" ON "public"."users".id = "public"."user_permissions".user_id
JOIN "public"."projects" ON "public"."users".id = "public"."projects".user_id
JOIN "public"."events" ON "public"."projects".id = "public"."events".project_id;
對於事務型 SQL 資料庫,如果上面的其中一個表格包含數百萬行,則該查詢可能會對使用相同資料庫的其他應用程式造成問題。
大多數以資料倉儲為導向的資料庫都支援使用者層級的配額,以限制資源使用量。
API 參考
如需所有 SQLDatabaseToolkit 功能和配置的詳細文件,請前往 API 參考。