跳到主要內容

mhtml

MHTML 既用於電子郵件,也用於存檔網頁。MHTML,有時稱為 MHT,代表 MIME HTML,是一個單一檔案,其中整個網頁都被存檔。當將網頁儲存為 MHTML 格式時,此檔案副檔名將包含 HTML 程式碼、圖像、音訊檔案、Flash 動畫等。

from langchain_community.document_loaders import MHTMLLoader
API 參考:MHTMLLoader
# Create a new loader object for the MHTML file
loader = MHTMLLoader(
file_path="../../../../../../tests/integration_tests/examples/example.mht"
)

# Load the document from the file
documents = loader.load()

# Print the documents to see the results
for doc in documents:
print(doc)
page_content='LangChain\nLANG CHAIN 🦜️🔗Official Home Page\xa0\n\n\n\n\n\n\n\nIntegrations\n\n\n\nFeatures\n\n\n\n\nBlog\n\n\n\nConceptual Guide\n\n\n\n\nPython Repo\n\n\nJavaScript Repo\n\n\n\nPython Documentation \n\n\nJavaScript Documentation\n\n\n\n\nPython ChatLangChain \n\n\nJavaScript ChatLangChain\n\n\n\n\nDiscord \n\n\nTwitter\n\n\n\n\nIf you have any comments about our WEB page, you can \nwrite us at the address shown above.  However, due to \nthe limited number of personnel in our corporate office, we are unable to \nprovide a direct response.\n\nCopyright © 2023-2023 LangChain Inc.\n\n\n' metadata={'source': '../../../../../../tests/integration_tests/examples/example.mht', 'title': 'LangChain'}

此頁面是否有幫助?