跳到主要內容
Open In ColabOpen on GitHub

Apify Actor

Apify Actors 是雲端程式,專為各種網路爬取、抓取和資料擷取任務而設計。這些 Actors 有助於從網路上自動收集資料,讓使用者能夠有效率地擷取、處理和儲存資訊。Actors 可用於執行諸如抓取電子商務網站以取得產品詳細資訊、監控價格變動或收集搜尋引擎結果等任務。它們與 Apify Datasets 無縫整合,允許 Actors 收集的結構化資料以 JSON、CSV 或 Excel 等格式儲存、管理和匯出,以供進一步分析或使用。

概觀

本筆記本將引導您使用 LangChain 的 Apify Actors 來自動化網路爬取和資料擷取。langchain-apify 套件將 Apify 的雲端工具與 LangChain 代理程式整合,為 AI 應用程式實現有效率的資料收集和處理。

設定

此整合存在於 langchain-apify 套件中。可以使用 pip 安裝此套件。

%pip install langchain-apify

先決條件

  • Apify 帳戶:在此處註冊您的免費 Apify 帳戶 here
  • Apify API 權杖:在 Apify 文件中了解如何取得您的 API 權杖。
import os

os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

例項化

在這裡,我們例項化 ApifyActorsTool 以便能夠呼叫 RAG Web Browser Apify Actor。此 Actor 為 AI 和 LLM 應用程式提供網路瀏覽功能,類似於 ChatGPT 中的網路瀏覽功能。Apify Store 中的任何 Actor 都可以用這種方式使用。

from langchain_apify import ApifyActorsTool

tool = ApifyActorsTool("apify/rag-web-browser")

調用

ApifyActorsTool 接受單一引數,即 run_input - 作為執行輸入傳遞給 Actor 的字典。執行輸入架構文件可以在 Actor 詳細資訊頁面的輸入區段中找到。請參閱 RAG Web Browser 輸入架構

tool.invoke({"run_input": {"query": "what is apify?", "maxResults": 2}})

鏈接

我們可以將建立的工具提供給 代理程式。當被要求搜尋資訊時,代理程式將呼叫 Apify Actor,後者將搜尋網路,然後檢索搜尋結果。

%pip install langgraph langchain-openai
from langchain_core.messages import ToolMessage
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

model = ChatOpenAI(model="gpt-4o")
tools = [tool]
graph = create_react_agent(model, tools=tools)
inputs = {"messages": [("user", "search for what is Apify")]}
for s in graph.stream(inputs, stream_mode="values"):
message = s["messages"][-1]
# skip tool messages
if isinstance(message, ToolMessage):
continue
message.pretty_print()
================================ Human Message =================================

search for what is Apify
================================== Ai Message ==================================
Tool Calls:
apify_actor_apify_rag-web-browser (call_27mjHLzDzwa5ZaHWCMH510lm)
Call ID: call_27mjHLzDzwa5ZaHWCMH510lm
Args:
run_input: {"run_input":{"query":"Apify","maxResults":3,"outputFormats":["markdown"]}}
================================== Ai Message ==================================

Apify is a comprehensive platform for web scraping, browser automation, and data extraction. It offers a wide array of tools and services that cater to developers and businesses looking to extract data from websites efficiently and effectively. Here's an overview of Apify:

1. **Ecosystem and Tools**:
- Apify provides an ecosystem where developers can build, deploy, and publish data extraction and web automation tools called Actors.
- The platform supports various use cases such as extracting data from social media platforms, conducting automated browser-based tasks, and more.

2. **Offerings**:
- Apify offers over 3,000 ready-made scraping tools and code templates.
- Users can also build custom solutions or hire Apify's professional services for more tailored data extraction needs.

3. **Technology and Integration**:
- The platform supports integration with popular tools and services like Zapier, GitHub, Google Sheets, Pinecone, and more.
- Apify supports open-source tools and technologies such as JavaScript, Python, Puppeteer, Playwright, Selenium, and its own Crawlee library for web crawling and browser automation.

4. **Community and Learning**:
- Apify hosts a community on Discord where developers can get help and share expertise.
- It offers educational resources through the Web Scraping Academy to help users become proficient in data scraping and automation.

5. **Enterprise Solutions**:
- Apify provides enterprise-grade web data extraction solutions with high reliability, 99.95% uptime, and compliance with SOC2, GDPR, and CCPA standards.

For more information, you can visit [Apify's official website](https://apify.com/) or their [GitHub page](https://github.com/apify) which contains their code repositories and further details about their projects.

API 參考

有關如何使用此整合的更多資訊,請參閱 git 儲存庫Apify 整合文件


此頁面是否對您有幫助?