JSONLoader
This notebook provides a quick overview for getting started with JSON document loader. For detailed documentation of all JSONLoader features and configurations head to the API reference. (本筆記本提供 JSON 文件載入器 的快速入門概述。如需所有 JSONLoader 功能和配置的詳細文件,請前往 API 參考文檔。)
- TODO: Add any other relevant links, like information about underlying API, etc. (待辦事項:新增任何其他相關連結,例如有關基礎 API 的資訊等。)
Overview (概觀)
Integration details (整合細節)
Class (類別) | Package (套件) | Local (本地) | Serializable (可序列化) | JS support (JS 支援) |
---|---|---|---|---|
JSONLoader | langchain_community | ✅ (是) | ❌ (否) | ✅ (是) |
Loader features (載入器功能)
Source (來源) | Document Lazy Loading (文件延遲載入) | Native Async Support (原生非同步支援) |
---|---|---|
JSONLoader | ✅ (是) | ❌ (否) |
Setup (設定)
To access JSON document loader you'll need to install the langchain-community
integration package as well as the jq
python package. (要存取 JSON 文件載入器,您需要安裝 langchain-community
整合套件以及 jq
python 套件。)
Credentials (憑證)
No credentials are required to use the JSONLoader
class. (使用 JSONLoader
類別不需要憑證。)
If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below (如果您想要獲得一流的模型呼叫自動追蹤,您也可以透過取消註解下方內容來設定您的 LangSmith API 金鑰)
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installation (安裝)
Install langchain_community and jq (安裝 langchain_community 和 jq)
%pip install -qU langchain_community jq
Initialization (初始化)
Now we can instantiate our model object and load documents (現在我們可以實例化我們的模型物件並載入文件)
- TODO: Update model instantiation with relevant params. (待辦事項:使用相關參數更新模型實例化。)
from langchain_community.document_loaders import JSONLoader
loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[].content",
text_content=False,
)
Load (載入)
docs = loader.load()
docs[0]
Document(metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}, page_content='Bye!')
print(docs[0].metadata)
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}
Lazy Load (延遲載入)
pages = []
for doc in loader.lazy_load():
pages.append(doc)
if len(pages) >= 10:
# do some paged operation, e.g.
# index.upsert(pages)
pages = []
Read from JSON Lines file (從 JSON Lines 檔案讀取)
If you want to load documents from a JSON Lines file, you pass json_lines=True
and specify jq_schema
to extract page_content
from a single JSON object. (如果您想要從 JSON Lines 檔案載入文件,請傳遞 json_lines=True
並指定 jq_schema
以從單一 JSON 物件中提取 page_content
。)
loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".content",
text_content=False,
json_lines=True,
)
docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
Read specific content keys (讀取特定內容金鑰)
Another option is to set jq_schema='.'
and provide a content_key
in order to only load specific content (另一種選擇是設定 jq_schema='.'
並提供 content_key
以僅載入特定內容)
loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".",
content_key="sender_name",
json_lines=True,
)
docs = loader.load()
print(docs[0])
page_content='User 2' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
JSON file with jq schema content_key
(具有 jq schema content_key
的 JSON 檔案)
To load documents from a JSON file using the content_key
within the jq schema, set is_content_key_jq_parsable=True
. Ensure that content_key
is compatible and can be parsed using the jq schema. (要使用 jq schema 中的 content_key
從 JSON 檔案載入文件,請設定 is_content_key_jq_parsable=True
。請確保 content_key
相容且可以使用 jq schema 進行剖析。)
loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[]",
content_key=".content",
is_content_key_jq_parsable=True,
)
docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}
Extracting metadata (提取元資料)
Generally, we want to include metadata available in the JSON file into the documents that we create from the content. (一般而言,我們希望將 JSON 檔案中可用的元資料包含在我們從內容建立的文件中。)
The following demonstrates how metadata can be extracted using the JSONLoader
. (以下示範如何使用 JSONLoader
提取元資料。)
There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to directly specify in the schema where the value for the page_content
can be extracted from. (需要注意一些關鍵變更。在先前的範例中,我們沒有收集元資料,我們設法在架構中直接指定可以從哪裡提取 page_content
的值。)
In this example, we have to tell the loader to iterate over the records in the messages
field. The jq_schema then has to be .messages[]
(在本範例中,我們必須告訴載入器迭代 messages
欄位中的記錄。然後 jq_schema 必須是 .messages[]
)
This allows us to pass the records (dict) into the metadata_func
that has to be implemented. The metadata_func
is responsible for identifying which pieces of information in the record should be included in the metadata stored in the final Document
object. (這讓我們可以將記錄 (dict) 傳遞到必須實作的 metadata_func
。metadata_func
負責識別記錄中的哪些資訊應包含在儲存在最終 Document
物件中的元資料中。)
Additionally, we now have to explicitly specify in the loader, via the content_key
argument, the key from the record where the value for the page_content
needs to be extracted from. (此外,我們現在必須透過 content_key
引數在載入器中明確指定記錄中需要從中提取 page_content
值的金鑰。)
# Define the metadata extraction function.
def metadata_func(record: dict, metadata: dict) -> dict:
metadata["sender_name"] = record.get("sender_name")
metadata["timestamp_ms"] = record.get("timestamp_ms")
return metadata
loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[]",
content_key="content",
metadata_func=metadata_func,
)
docs = loader.load()
print(docs[0].metadata)
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1, 'sender_name': 'User 2', 'timestamp_ms': 1675597571851}
API reference (API 參考文檔)
For detailed documentation of all JSONLoader features and configurations head to the API reference: https://langchain-python.dev.org.tw/api_reference/community/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html (如需所有 JSONLoader 功能和配置的詳細文件,請前往 API 參考文檔:https://langchain-python.dev.org.tw/api_reference/community/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html)
Related (相關內容)
- Document loader conceptual guide (文件載入器 概念指南)
- Document loader how-to guides (文件載入器 操作指南)