跳到主要內容
Open In ColabOpen on GitHub

如何載入 CSV 檔案

逗號分隔值 (CSV) 檔案是以逗號分隔值的定界文字檔案。檔案的每一行都是資料記錄。每個記錄由一個或多個欄位組成,欄位之間以逗號分隔。

LangChain 實作了 CSV 載入器,可將 CSV 檔案載入到 Document 物件的序列中。CSV 檔案的每一列都會轉換為一個文件。

from langchain_community.document_loaders.csv_loader import CSVLoader

file_path = "../integrations/document_loaders/example_data/mlb_teams_2012.csv"

loader = CSVLoader(file_path=file_path)
data = loader.load()

for record in data[:2]:
print(record)
API 參考:CSVLoader
page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98' metadata={'source': '../../../docs/integrations/document_loaders/example_data/mlb_teams_2012.csv', 'row': 0}
page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97' metadata={'source': '../../../docs/integrations/document_loaders/example_data/mlb_teams_2012.csv', 'row': 1}

自訂 CSV 解析和載入

CSVLoader 將接受 csv_args kwarg,以支援自訂傳遞給 Python 的 csv.DictReader 的參數。請參閱 csv 模組 文件,以取得有關支援哪些 csv 參數的更多資訊。

loader = CSVLoader(
file_path=file_path,
csv_args={
"delimiter": ",",
"quotechar": '"',
"fieldnames": ["MLB Team", "Payroll in millions", "Wins"],
},
)

data = loader.load()
for record in data[:2]:
print(record)
page_content='MLB Team: Team\nPayroll in millions: "Payroll (millions)"\nWins: "Wins"' metadata={'source': '../../../docs/integrations/document_loaders/example_data/mlb_teams_2012.csv', 'row': 0}
page_content='MLB Team: Nationals\nPayroll in millions: 81.34\nWins: 98' metadata={'source': '../../../docs/integrations/document_loaders/example_data/mlb_teams_2012.csv', 'row': 1}

指定欄位以識別文件來源

可以使用 CSV 的欄位設定 Document 中繼資料上的 "source" 鍵。使用 source_column 參數指定從每一列建立的文件來源。否則,file_path 將用作從 CSV 檔案建立的所有文件的來源。

當使用從 CSV 檔案載入的文件來回答使用來源的問題的鏈時,這非常有用。

loader = CSVLoader(file_path=file_path, source_column="Team")

data = loader.load()
for record in data[:2]:
print(record)
page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98' metadata={'source': 'Nationals', 'row': 0}
page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97' metadata={'source': 'Reds', 'row': 1}

從字串載入

直接處理 CSV 字串時,可以使用 Python 的 tempfile

import tempfile
from io import StringIO

string_data = """
"Team", "Payroll (millions)", "Wins"
"Nationals", 81.34, 98
"Reds", 82.20, 97
"Yankees", 197.96, 95
"Giants", 117.62, 94
""".strip()


with tempfile.NamedTemporaryFile(delete=False, mode="w+") as temp_file:
temp_file.write(string_data)
temp_file_path = temp_file.name

loader = CSVLoader(file_path=temp_file_path)
data = loader.load()
for record in data[:2]:
print(record)
page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98' metadata={'source': 'Nationals', 'row': 0}
page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97' metadata={'source': 'Reds', 'row': 1}

此頁面是否對您有幫助?