BoxRetriever
This will help you getting started with the Box retriever. For detailed documentation of all BoxRetriever features and configurations head to the API reference. (這將幫助您開始使用 Box 檢索器。 如需 BoxRetriever 所有功能和配置的詳細文件,請前往 API 參考文檔。)
Overview (概述)
The BoxRetriever
class helps you get your unstructured content from Box in Langchain's Document
format. You can do this by searching for files based on a full-text search or using Box AI to retrieve a Document
containing the result of an AI query against files. This requires including a List[str]
containing Box file ids, i.e. ["12345","67890"]
(BoxRetriever
類別可協助您以 Langchain 的 Document
格式從 Box 取得非結構化內容。 您可以透過全文檢索搜尋檔案,或使用 Box AI 檢索包含針對檔案進行 AI 查詢結果的 Document
來執行此操作。 這需要包含一個包含 Box 檔案 ID 的 List[str]
,例如 ["12345","67890"]
)
Box AI requires an Enterprise Plus license (Box AI 需要 Enterprise Plus 授權)
Files without a text representation will be skipped. (沒有文字表示形式的檔案將會略過。)
Integration details (整合詳細資訊)
1: Bring-your-own data (i.e., index and search a custom corpus of documents) (1:自備資料(即,索引和搜尋文件的自訂語料庫))
Retriever (檢索器) | Self-host (自託管) | Cloud offering (雲端服務) | Package (套件) |
---|---|---|---|
BoxRetriever | ❌ | ✅ | langchain-box |
Setup (設定)
In order to use the Box package, you will need a few things (為了使用 Box 套件,您需要準備一些東西)
- A Box account — If you are not a current Box customer or want to test outside of your production Box instance, you can use a free developer account. (一個 Box 帳戶 — 如果您不是現有的 Box 客戶,或者想要在生產 Box 實例之外進行測試,您可以使用 免費的開發人員帳戶。)
- A Box app — This is configured in the developer console, and for Box AI, must have the
Manage AI
scope enabled. Here you will also select your authentication method (一個 Box 應用程式 — 這是在 開發人員控制台中配置的,對於 Box AI,必須啟用Manage AI
範圍。 在這裡您還可以選擇您的身份驗證方法) - The app must be enabled by the administrator. For free developer accounts, this is whomever signed up for the account. (應用程式必須由 管理員啟用。 對於免費的開發人員帳戶,這是註冊該帳戶的任何人。)
Credentials (憑證)
For these examples, we will use token authentication. This can be used with any authentication method. Just get the token with whatever methodology. If you want to learn more about how to use other authentication types with langchain-box
, visit the Box provider document. (在這些範例中,我們將使用 Token 身份驗證。 這可以用於任何 身份驗證方法。 只需使用任何方法取得 Token。 如果您想了解更多關於如何將其他身份驗證類型與 langchain-box
結合使用,請瀏覽 Box 提供者 文件。)
import getpass
import os
box_developer_token = getpass.getpass("Enter your Box Developer Token: ")
If you want to get automated tracing from individual queries, you can also set your LangSmith API key by uncommenting below (如果您想從個別查詢中取得自動追蹤,您也可以透過取消註釋下方內容來設定您的 LangSmith API 金鑰)
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installation (安裝)
This retriever lives in the langchain-box
package (這個檢索器位於 langchain-box
套件中)
%pip install -qU langchain-box
Note: you may need to restart the kernel to use updated packages.
Instantiation (實例化)
Now we can instantiate our retriever (現在我們可以實例化我們的檢索器)
Search (搜尋)
from langchain_box import BoxRetriever
retriever = BoxRetriever(box_developer_token=box_developer_token)
For more granular search, we offer a series of options to help you filter down the results. This uses the langchain_box.utilities.SearchOptions
in conjunction with the langchain_box.utilities.SearchTypeFilter
and langchain_box.utilities.DocumentFiles
enums to filter on things like created date, which part of the file to search, and even to limit the search scope to a specific folder. (為了更精細的搜尋,我們提供了一系列選項來協助您篩選結果。 這使用 langchain_box.utilities.SearchOptions
結合 langchain_box.utilities.SearchTypeFilter
和 langchain_box.utilities.DocumentFiles
枚舉來篩選諸如建立日期之類的事項、要搜尋檔案的哪個部分,甚至將搜尋範圍限制在特定資料夾中。)
For more information, check out the API reference. (如需更多資訊,請查看 API 參考文檔。)
from langchain_box.utilities import BoxSearchOptions, DocumentFiles, SearchTypeFilter
box_folder_id = "260931903795"
box_search_options = BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00", "2024-08-01T00:00:00-07:00,"],
k=200,
size_range=[1, 1000000],
updated_data_range=None,
)
retriever = BoxRetriever(
box_developer_token=box_developer_token, box_search_options=box_search_options
)
retriever.invoke("AstroTech Solutions")
[Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n - Gravitational Wave Detector Kit: $800\n - Exoplanet Terrarium: $120\nTotal: $920')]
Box AI
from langchain_box import BoxRetriever
box_file_ids = ["1514555423624", "1514553902288"]
retriever = BoxRetriever(
box_developer_token=box_developer_token, box_file_ids=box_file_ids
)
Usage (用法)
query = "What was the most expensive item purchased"
retriever.invoke(query)
[Document(metadata={'source': 'Box AI', 'title': 'Box AI What was the most expensive item purchased'}, page_content='The most expensive item purchased is the **Gravitational Wave Detector Kit** from AstroTech Solutions, which costs **$800**.')]
Citations (引用)
With Box AI and the BoxRetriever
, you can return the answer to your prompt, return the citations used by Box to get that answer, or both. No matter how you choose to use Box AI, the retriever returns a List[Document]
object. We offer this flexibility with two bool
arguments, answer
and citations
. Answer defaults to True
and citations defaults to False
, do you can omit both if you just want the answer. If you want both, you can just include citations=True
and if you only want citations, you would include answer=False
and citations=True
(使用 Box AI 和 BoxRetriever
,您可以返回提示的答案、返回 Box 用於取得該答案的引用,或兩者都返回。 無論您選擇如何使用 Box AI,檢索器都會返回 List[Document]
物件。 我們透過兩個 bool
參數 answer
和 citations
提供此靈活性。 Answer 預設為 True
,citations 預設為 False
,如果您只需要答案,則可以省略兩者。 如果您想要兩者,您可以只包含 citations=True
,如果您只想要引用,您可以包含 answer=False
和 citations=True
)
Get both (取得兩者)
retriever = BoxRetriever(
box_developer_token=box_developer_token, box_file_ids=box_file_ids, citations=True
)
retriever.invoke(query)
[Document(metadata={'source': 'Box AI', 'title': 'Box AI What was the most expensive item purchased'}, page_content='The most expensive item purchased is the **Gravitational Wave Detector Kit** from AstroTech Solutions, which costs **$800**.'),
Document(metadata={'source': 'Box AI What was the most expensive item purchased', 'file_name': 'Invoice-A5555.txt', 'file_id': '1514555423624', 'file_type': 'file'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n - Gravitational Wave Detector Kit: $800\n - Exoplanet Terrarium: $120\nTotal: $920')]
Citations only (僅取得引用)
retriever = BoxRetriever(
box_developer_token=box_developer_token,
box_file_ids=box_file_ids,
answer=False,
citations=True,
)
retriever.invoke(query)
[Document(metadata={'source': 'Box AI What was the most expensive item purchased', 'file_name': 'Invoice-A5555.txt', 'file_id': '1514555423624', 'file_type': 'file'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n - Gravitational Wave Detector Kit: $800\n - Exoplanet Terrarium: $120\nTotal: $920')]
Use within a chain (在鏈中使用)
Like other retrievers, BoxRetriever can be incorporated into LLM applications via chains. (與其他檢索器一樣,BoxRetriever 可以透過 鏈整合到 LLM 應用程式中。)
我們需要一個 LLM 或聊天模型
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
openai_key = getpass.getpass("Enter your OpenAI key: ")
Enter your OpenAI key: ········
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
box_search_options = BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00", "2024-08-01T00:00:00-07:00,"],
k=200,
size_range=[1, 1000000],
updated_data_range=None,
)
retriever = BoxRetriever(
box_developer_token=box_developer_token, box_search_options=box_search_options
)
context = "You are a finance professional that handles invoices and purchase orders."
question = "Show me all the items purchased from AstroTech Solutions"
prompt = ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.
Context: {context}
Question: {question}"""
)
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke(question)
'- Gravitational Wave Detector Kit: $800\n- Exoplanet Terrarium: $120'
用作 Agent 工具
與其他檢索器一樣,BoxRetriever 也可以作為工具添加到 LangGraph Agent 中。
pip install -U langsmith
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools.retriever import create_retriever_tool
box_search_options = BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00", "2024-08-01T00:00:00-07:00,"],
k=200,
size_range=[1, 1000000],
updated_data_range=None,
)
retriever = BoxRetriever(
box_developer_token=box_developer_token, box_search_options=box_search_options
)
box_search_tool = create_retriever_tool(
retriever,
"box_search_tool",
"This tool is used to search Box and retrieve documents that match the search criteria",
)
tools = [box_search_tool]
prompt = hub.pull("hwchase17/openai-tools-agent")
prompt.messages
llm = ChatOpenAI(temperature=0, openai_api_key=openai_key)
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)
/Users/shurrey/local/langchain/.venv/lib/python3.11/site-packages/langsmith/client.py:312: LangSmithMissingAPIKeyWarning: API key must be provided when using hosted LangSmith API
warnings.warn(
result = agent_executor.invoke(
{
"input": "list the items I purchased from AstroTech Solutions from most expensive to least expensive"
}
)
print(f"result {result['output']}")
result The items you purchased from AstroTech Solutions from most expensive to least expensive are:
1. Gravitational Wave Detector Kit: $800
2. Exoplanet Terrarium: $120
Total: $920
API 參考
有關所有 BoxRetriever 功能和設定的詳細文件,請前往 API 參考。
協助
如果您有任何問題,您可以查看我們的開發人員文件或在我們的開發人員社群中與我們聯繫。