ApertureDB
ApertureDB 是一個資料庫,用於儲存、索引和管理多模態資料,例如文字、圖像、影片、邊界框和嵌入,以及它們相關的元資料。
本筆記本說明如何使用 ApertureDB 的嵌入功能。
安裝 ApertureDB Python SDK
這會安裝用於編寫 ApertureDB 用戶端程式碼的 Python SDK。
%pip install --upgrade --quiet aperturedb
Note: you may need to restart the kernel to use updated packages.
執行 ApertureDB 實例
要繼續,您應該有一個 ApertureDB 實例正在運行,並配置您的環境以使用它。
有很多方法可以做到這一點,例如
docker run --publish 55555:55555 aperturedata/aperturedb-standalone
adb config create local --active --no-interactive
下載一些網頁文件
我們將在此處對一個網頁進行迷你爬取。
# For loading documents from web
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.aperturedata.io")
docs = loader.load()
API 參考:WebBaseLoader
USER_AGENT environment variable not set, consider setting it to identify your requests.
選擇嵌入模型
我們想要使用 OllamaEmbeddings,因此我們必須匯入必要的模組。
Ollama 可以設定為 Docker 容器,如 文件中所述,例如
# Run server
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Tell server to load a specific model
docker exec ollama ollama run llama2
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings()
API 參考:OllamaEmbeddings
將文件分割成段落
我們想要將單個文件轉換為多個段落。
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
從文件和嵌入建立向量儲存
此程式碼會在 ApertureDB 實例中建立向量儲存。在實例中,此向量儲存表示為 "描述符集"。預設情況下,描述符集名為 langchain
。以下程式碼將為每個文件產生嵌入,並將它們作為描述符儲存在 ApertureDB 中。這將花費幾秒鐘,因為嵌入正在產生。
from langchain_community.vectorstores import ApertureDB
vector_db = ApertureDB.from_documents(documents, embeddings)
API 參考:ApertureDB
選擇大型語言模型
同樣,我們使用為本地處理設定的 Ollama 伺服器。
from langchain_community.llms import Ollama
llm = Ollama(model="llama2")
API 參考:Ollama
建立 RAG 鏈
現在我們擁有建立 RAG(檢索增強生成)鏈所需的所有組件。此鏈執行以下操作
- 為使用者查詢產生嵌入描述符
- 使用向量儲存尋找與使用者查詢相似的文字段落
- 使用提示範本將使用者查詢和上下文文件傳遞給 LLM
- 傳回 LLM 的答案
# Create prompt
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")
# Create a chain that passes documents to an LLM
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain = create_stuff_documents_chain(llm, prompt)
# Treat the vectorstore as a document retriever
retriever = vector_db.as_retriever()
# Create a RAG chain that connects the retriever to the LLM
from langchain.chains import create_retrieval_chain
retrieval_chain = create_retrieval_chain(retriever, document_chain)
Based on the provided context, ApertureDB can store images. In fact, it is specifically designed to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. So, ApertureDB has the capability to store and manage images.
執行 RAG 鏈
最後,我們將問題傳遞給鏈並獲得答案。這將花費幾秒鐘來執行,因為 LLM 會從查詢和上下文文件中產生答案。
user_query = "How can ApertureDB store images?"
response = retrieval_chain.invoke({"input": user_query})
print(response["answer"])
Based on the provided context, ApertureDB can store images in several ways:
1. Multimodal data management: ApertureDB offers a unified interface to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. This means that images can be stored along with other types of data in a single database instance.
2. Image storage: ApertureDB provides image storage capabilities through its integration with the public cloud providers or on-premise installations. This allows customers to host their own ApertureDB instances and store images on their preferred cloud provider or on-premise infrastructure.
3. Vector database: ApertureDB also offers a vector database that enables efficient similarity search and classification of images based on their semantic meaning. This can be useful for applications where image search and classification are important, such as in computer vision or machine learning workflows.
Overall, ApertureDB provides flexible and scalable storage options for images, allowing customers to choose the deployment model that best suits their needs.