Browserless

Browserless 是一項服務，可讓您在雲端中執行無頭 Chrome 執行個體。這是在大規模執行基於瀏覽器的自動化作業的絕佳方式，而無需擔心管理自己的基礎架構。

若要將 Browserless 作為文件載入器使用，請初始化 BrowserlessLoader 執行個體，如此筆記本所示。請注意，預設情況下，BrowserlessLoader 會傳回頁面 body 元素的 innerText。若要停用此功能並取得原始 HTML，請將 text_content 設定為 False。

from langchain_community.document_loaders import BrowserlessLoader

API 參考：BrowserlessLoader

BROWSERLESS_API_TOKEN = "YOUR_BROWSERLESS_API_TOKEN"

loader = BrowserlessLoader(
    api_token=BROWSERLESS_API_TOKEN,
    urls=[
        "https://en.wikipedia.org/wiki/Document_classification",
    ],
    text_content=True,
)

documents = loader.load()

print(documents[0].page_content[:1000])

Jump to content
Main menu
Search
Create account
Log in
Personal tools
Toggle the table of contents
Document classification
17 languages
Article
Talk
Read
Edit
View history
Tools
From Wikipedia, the free encyclopedia

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.

The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied.

Do

文件載入器概念指南
文件載入器操作指南

相關​

此頁面是否對您有幫助？

相關