跳到主要內容
Open In ColabOpen on GitHub

Marqo

這個筆記本展示如何使用與 Marqo 向量儲存庫相關的功能。

Marqo 是一個開源向量搜尋引擎。Marqo 讓您可以儲存和查詢多模態資料,例如文字和圖片。Marqo 使用大量的開源模型為您建立向量,您也可以提供自己微調的模型,Marqo 將為您處理載入和推論。

您需要使用 pip install -qU langchain-community 安裝 langchain-community 才能使用此整合

若要使用我們的 Docker 映像執行此筆記本,請先執行以下命令以取得 Marqo

docker pull marqoai/marqo:latest
docker rm -f marqo
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
%pip install --upgrade --quiet  marqo
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Marqo
from langchain_text_splitters import CharacterTextSplitter
API 參考文件:TextLoader | Marqo | CharacterTextSplitter
from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
API 參考文件:TextLoader
import marqo

# initialize marqo
marqo_url = "https://127.0.0.1:8882" # if using marqo cloud replace with your endpoint (console.marqo.ai)
marqo_api_key = "" # if using marqo cloud replace with your api key (console.marqo.ai)

client = marqo.Client(url=marqo_url, api_key=marqo_api_key)

index_name = "langchain-demo"

docsearch = Marqo.from_documents(docs, index_name=index_name)

query = "What did the president say about Ketanji Brown Jackson"
result_docs = docsearch.similarity_search(query)
Index langchain-demo exists.
print(result_docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
result_docs = docsearch.similarity_search_with_score(query)
print(result_docs[0][0].page_content, result_docs[0][1], sep="\n")
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
0.68647254

附加功能

Marqo 作為向量儲存庫的強大功能之一是,您可以使用外部建立的索引。例如

  • 如果您有來自另一個應用程式的圖片和文字配對資料庫,您可以直接在 langchain 中搭配 Marqo 向量儲存庫使用它。請注意,自帶多模態索引將停用 add_texts 方法。

  • 如果您有文字文件資料庫,您可以將其導入 langchain 框架,並透過 add_texts 新增更多文字。

傳回的文件會透過將您自己的函數傳遞到搜尋方法中的 page_content_builder 回調來進行自訂。

多模態範例

# use a new index
index_name = "langchain-multimodal-demo"

# incase the demo is re-run
try:
client.delete_index(index_name)
except Exception:
print(f"Creating {index_name}")

# This index could have been created by another system
settings = {"treat_urls_and_pointers_as_images": True, "model": "ViT-L/14"}
client.create_index(index_name, **settings)
client.index(index_name).add_documents(
[
# image of a bus
{
"caption": "Bus",
"image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg",
},
# image of a plane
{
"caption": "Plane",
"image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg",
},
],
)
{'errors': False,
'processingTimeMs': 2090.2822139996715,
'index_name': 'langchain-multimodal-demo',
'items': [{'_id': 'aa92fc1c-1fb2-4d86-b027-feb507c419f7',
'result': 'created',
'status': 201},
{'_id': '5142c258-ef9f-4bf2-a1a6-2307280173a0',
'result': 'created',
'status': 201}]}
def get_content(res):
"""Helper to format Marqo's documents into text to be used as page_content"""
return f"{res['caption']}: {res['image']}"


docsearch = Marqo(client, index_name, page_content_builder=get_content)


query = "vehicles that fly"
doc_results = docsearch.similarity_search(query)
for doc in doc_results:
print(doc.page_content)
Plane: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg
Bus: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg

純文字範例

# use a new index
index_name = "langchain-byo-index-demo"

# incase the demo is re-run
try:
client.delete_index(index_name)
except Exception:
print(f"Creating {index_name}")

# This index could have been created by another system
client.create_index(index_name)
client.index(index_name).add_documents(
[
{
"Title": "Smartphone",
"Description": "A smartphone is a portable computer device that combines mobile telephone "
"functions and computing functions into one unit.",
},
{
"Title": "Telephone",
"Description": "A telephone is a telecommunications device that permits two or more users to"
"conduct a conversation when they are too far apart to be easily heard directly.",
},
],
)
{'errors': False,
'processingTimeMs': 139.2144540004665,
'index_name': 'langchain-byo-index-demo',
'items': [{'_id': '27c05a1c-b8a9-49a5-ae73-fbf1eb51dc3f',
'result': 'created',
'status': 201},
{'_id': '6889afe0-e600-43c1-aa3b-1d91bf6db274',
'result': 'created',
'status': 201}]}
# Note text indexes retain the ability to use add_texts despite different field names in documents
# this is because the page_content_builder callback lets you handle these document fields as required


def get_content(res):
"""Helper to format Marqo's documents into text to be used as page_content"""
if "text" in res:
return res["text"]
return res["Description"]


docsearch = Marqo(client, index_name, page_content_builder=get_content)

docsearch.add_texts(["This is a document that is about elephants"])
['9986cc72-adcd-4080-9d74-265c173a9ec3']
query = "modern communications devices"
doc_results = docsearch.similarity_search(query)

print(doc_results[0].page_content)
A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.
query = "elephants"
doc_results = docsearch.similarity_search(query, page_content_builder=get_content)

print(doc_results[0].page_content)
This is a document that is about elephants

加權查詢

我們也公開了 Marqo 的加權查詢,這是一種組合複雜語意搜尋的強大方法。

query = {"communications devices": 1.0}
doc_results = docsearch.similarity_search(query)
print(doc_results[0].page_content)
A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.
query = {"communications devices": 1.0, "technology post 2000": -1.0}
doc_results = docsearch.similarity_search(query)
print(doc_results[0].page_content)
A telephone is a telecommunications device that permits two or more users toconduct a conversation when they are too far apart to be easily heard directly.

來源問答

本節說明如何將 Marqo 用作 RetrievalQAWithSourcesChain 的一部分。Marqo 將在來源中執行資訊搜尋。

import getpass
import os

from langchain.chains import RetrievalQAWithSourcesChain
from langchain_openai import OpenAI

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
API 參考文件:RetrievalQAWithSourcesChain | OpenAI
OpenAI API Key:········
with open("../../how_to/state_of_the_union.txt") as f:
state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)
index_name = "langchain-qa-with-retrieval"
docsearch = Marqo.from_documents(docs, index_name=index_name)
Index langchain-qa-with-retrieval exists.
chain = RetrievalQAWithSourcesChain.from_chain_type(
OpenAI(temperature=0), chain_type="stuff", retriever=docsearch.as_retriever()
)
chain(
{"question": "What did the president say about Justice Breyer"},
return_only_outputs=True,
)
{'answer': ' The president honored Justice Breyer, thanking him for his service and noting that he is a retiring Justice of the United States Supreme Court.\n',
'sources': '../../../state_of_the_union.txt'}

此頁面是否對您有幫助?