跳至主要內容

JSONLoader

This notebook provides a quick overview for getting started with JSON document loader. For detailed documentation of all JSONLoader features and configurations head to the API reference. (本筆記本提供 JSON 文件載入器 的快速入門概述。如需所有 JSONLoader 功能和配置的詳細文件,請前往 API 參考文檔。)

  • TODO: Add any other relevant links, like information about underlying API, etc. (待辦事項:新增任何其他相關連結,例如有關基礎 API 的資訊等。)

Overview (概觀)

Integration details (整合細節)

Class (類別)Package (套件)Local (本地)Serializable (可序列化)JS support (JS 支援)
JSONLoaderlangchain_community✅ (是)❌ (否)✅ (是)

Loader features (載入器功能)

Source (來源)Document Lazy Loading (文件延遲載入)Native Async Support (原生非同步支援)
JSONLoader✅ (是)❌ (否)

Setup (設定)

To access JSON document loader you'll need to install the langchain-community integration package as well as the jq python package. (要存取 JSON 文件載入器,您需要安裝 langchain-community 整合套件以及 jq python 套件。)

Credentials (憑證)

No credentials are required to use the JSONLoader class. (使用 JSONLoader 類別不需要憑證。)

If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below (如果您想要獲得一流的模型呼叫自動追蹤,您也可以透過取消註解下方內容來設定您的 LangSmith API 金鑰)

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation (安裝)

Install langchain_community and jq (安裝 langchain_communityjq)

%pip install -qU langchain_community jq 

Initialization (初始化)

Now we can instantiate our model object and load documents (現在我們可以實例化我們的模型物件並載入文件)

  • TODO: Update model instantiation with relevant params. (待辦事項:使用相關參數更新模型實例化。)
from langchain_community.document_loaders import JSONLoader

loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[].content",
text_content=False,
)
API Reference:JSONLoader (API 參考文檔:JSONLoader)

Load (載入)

docs = loader.load()
docs[0]
Document(metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}, page_content='Bye!')
print(docs[0].metadata)
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}

Lazy Load (延遲載入)

pages = []
for doc in loader.lazy_load():
pages.append(doc)
if len(pages) >= 10:
# do some paged operation, e.g.
# index.upsert(pages)

pages = []

Read from JSON Lines file (從 JSON Lines 檔案讀取)

If you want to load documents from a JSON Lines file, you pass json_lines=True and specify jq_schema to extract page_content from a single JSON object. (如果您想要從 JSON Lines 檔案載入文件,請傳遞 json_lines=True 並指定 jq_schema 以從單一 JSON 物件中提取 page_content。)

loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".content",
text_content=False,
json_lines=True,
)

docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}

Read specific content keys (讀取特定內容金鑰)

Another option is to set jq_schema='.' and provide a content_key in order to only load specific content (另一種選擇是設定 jq_schema='.' 並提供 content_key 以僅載入特定內容)

loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".",
content_key="sender_name",
json_lines=True,
)

docs = loader.load()
print(docs[0])
page_content='User 2' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}

JSON file with jq schema content_key (具有 jq schema content_key 的 JSON 檔案)

To load documents from a JSON file using the content_key within the jq schema, set is_content_key_jq_parsable=True. Ensure that content_key is compatible and can be parsed using the jq schema. (要使用 jq schema 中的 content_key 從 JSON 檔案載入文件,請設定 is_content_key_jq_parsable=True。請確保 content_key 相容且可以使用 jq schema 進行剖析。)

loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[]",
content_key=".content",
is_content_key_jq_parsable=True,
)

docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}

Extracting metadata (提取元資料)

Generally, we want to include metadata available in the JSON file into the documents that we create from the content. (一般而言,我們希望將 JSON 檔案中可用的元資料包含在我們從內容建立的文件中。)

The following demonstrates how metadata can be extracted using the JSONLoader. (以下示範如何使用 JSONLoader 提取元資料。)

There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to directly specify in the schema where the value for the page_content can be extracted from. (需要注意一些關鍵變更。在先前的範例中,我們沒有收集元資料,我們設法在架構中直接指定可以從哪裡提取 page_content 的值。)

In this example, we have to tell the loader to iterate over the records in the messages field. The jq_schema then has to be .messages[] (在本範例中,我們必須告訴載入器迭代 messages 欄位中的記錄。然後 jq_schema 必須是 .messages[])

This allows us to pass the records (dict) into the metadata_func that has to be implemented. The metadata_func is responsible for identifying which pieces of information in the record should be included in the metadata stored in the final Document object. (這讓我們可以將記錄 (dict) 傳遞到必須實作的 metadata_funcmetadata_func 負責識別記錄中的哪些資訊應包含在儲存在最終 Document 物件中的元資料中。)

Additionally, we now have to explicitly specify in the loader, via the content_key argument, the key from the record where the value for the page_content needs to be extracted from. (此外,我們現在必須透過 content_key 引數在載入器中明確指定記錄中需要從中提取 page_content 值的金鑰。)

# Define the metadata extraction function.
def metadata_func(record: dict, metadata: dict) -> dict:
metadata["sender_name"] = record.get("sender_name")
metadata["timestamp_ms"] = record.get("timestamp_ms")

return metadata


loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[]",
content_key="content",
metadata_func=metadata_func,
)

docs = loader.load()
print(docs[0].metadata)
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1, 'sender_name': 'User 2', 'timestamp_ms': 1675597571851}

API reference (API 參考文檔)

For detailed documentation of all JSONLoader features and configurations head to the API reference: https://langchain-python.dev.org.tw/api_reference/community/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html (如需所有 JSONLoader 功能和配置的詳細文件,請前往 API 參考文檔:https://langchain-python.dev.org.tw/api_reference/community/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html)


Was this page helpful? (此頁面有幫助嗎?)