Apify Actor
Apify Actors 是雲端程式,專為各種網路爬取、抓取和資料擷取任務而設計。這些 Actors 有助於從網路上自動收集資料,讓使用者能夠有效率地擷取、處理和儲存資訊。Actors 可用於執行諸如抓取電子商務網站以取得產品詳細資訊、監控價格變動或收集搜尋引擎結果等任務。它們與 Apify Datasets 無縫整合,允許 Actors 收集的結構化資料以 JSON、CSV 或 Excel 等格式儲存、管理和匯出,以供進一步分析或使用。
概觀
本筆記本將引導您使用 LangChain 的 Apify Actors 來自動化網路爬取和資料擷取。langchain-apify
套件將 Apify 的雲端工具與 LangChain 代理程式整合,為 AI 應用程式實現有效率的資料收集和處理。
設定
此整合存在於 langchain-apify 套件中。可以使用 pip 安裝此套件。
%pip install langchain-apify
先決條件
import os
os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
例項化
在這裡,我們例項化 ApifyActorsTool
以便能夠呼叫 RAG Web Browser Apify Actor。此 Actor 為 AI 和 LLM 應用程式提供網路瀏覽功能,類似於 ChatGPT 中的網路瀏覽功能。Apify Store 中的任何 Actor 都可以用這種方式使用。
from langchain_apify import ApifyActorsTool
tool = ApifyActorsTool("apify/rag-web-browser")
調用
ApifyActorsTool
接受單一引數,即 run_input
- 作為執行輸入傳遞給 Actor 的字典。執行輸入架構文件可以在 Actor 詳細資訊頁面的輸入區段中找到。請參閱 RAG Web Browser 輸入架構。
tool.invoke({"run_input": {"query": "what is apify?", "maxResults": 2}})
鏈接
我們可以將建立的工具提供給 代理程式。當被要求搜尋資訊時,代理程式將呼叫 Apify Actor,後者將搜尋網路,然後檢索搜尋結果。
%pip install langgraph langchain-openai
from langchain_core.messages import ToolMessage
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
model = ChatOpenAI(model="gpt-4o")
tools = [tool]
graph = create_react_agent(model, tools=tools)
inputs = {"messages": [("user", "search for what is Apify")]}
for s in graph.stream(inputs, stream_mode="values"):
message = s["messages"][-1]
# skip tool messages
if isinstance(message, ToolMessage):
continue
message.pretty_print()
================================[1m Human Message [0m=================================
search for what is Apify
==================================[1m Ai Message [0m==================================
Tool Calls:
apify_actor_apify_rag-web-browser (call_27mjHLzDzwa5ZaHWCMH510lm)
Call ID: call_27mjHLzDzwa5ZaHWCMH510lm
Args:
run_input: {"run_input":{"query":"Apify","maxResults":3,"outputFormats":["markdown"]}}
==================================[1m Ai Message [0m==================================
Apify is a comprehensive platform for web scraping, browser automation, and data extraction. It offers a wide array of tools and services that cater to developers and businesses looking to extract data from websites efficiently and effectively. Here's an overview of Apify:
1. **Ecosystem and Tools**:
- Apify provides an ecosystem where developers can build, deploy, and publish data extraction and web automation tools called Actors.
- The platform supports various use cases such as extracting data from social media platforms, conducting automated browser-based tasks, and more.
2. **Offerings**:
- Apify offers over 3,000 ready-made scraping tools and code templates.
- Users can also build custom solutions or hire Apify's professional services for more tailored data extraction needs.
3. **Technology and Integration**:
- The platform supports integration with popular tools and services like Zapier, GitHub, Google Sheets, Pinecone, and more.
- Apify supports open-source tools and technologies such as JavaScript, Python, Puppeteer, Playwright, Selenium, and its own Crawlee library for web crawling and browser automation.
4. **Community and Learning**:
- Apify hosts a community on Discord where developers can get help and share expertise.
- It offers educational resources through the Web Scraping Academy to help users become proficient in data scraping and automation.
5. **Enterprise Solutions**:
- Apify provides enterprise-grade web data extraction solutions with high reliability, 99.95% uptime, and compliance with SOC2, GDPR, and CCPA standards.
For more information, you can visit [Apify's official website](https://apify.com/) or their [GitHub page](https://github.com/apify) which contains their code repositories and further details about their projects.
API 參考
有關如何使用此整合的更多資訊,請參閱 git 儲存庫 或 Apify 整合文件。