Pinecone 混合搜尋

Pinecone 是一個具有廣泛功能的向量資料庫。

本筆記本將介紹如何使用一個檢索器，該檢索器底層使用 Pinecone 和混合搜尋。

此檢索器的邏輯來自此文件

要使用 Pinecone，您必須擁有 API 金鑰和環境。這是安裝說明。

%pip install --upgrade --quiet  pinecone pinecone-text pinecone-notebooks

# Connect to Pinecone and get an API key.
from pinecone_notebooks.colab import Authenticate

Authenticate()

import os

api_key = os.environ["PINECONE_API_KEY"]

from langchain_community.retrievers import (
    PineconeHybridSearchRetriever,
)

API 參考：PineconeHybridSearchRetriever

我們想要使用 OpenAIEmbeddings，因此我們必須取得 OpenAI API 金鑰。

import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

設定 Pinecone

您只需要執行此部分一次。

import os

from pinecone import Pinecone, ServerlessSpec

index_name = "langchain-pinecone-hybrid-search"

# initialize Pinecone client
pc = Pinecone(api_key=api_key)

# create the index
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,  # dimensionality of dense model
        metric="dotproduct",  # sparse values supported only for dotproduct
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

WhoAmIResponse(username='load', user_label='label', projectname='load-test')

現在索引已建立，我們可以使用了。

index = pc.Index(index_name)

取得嵌入和稀疏編碼器

嵌入用於密集向量，分詞器用於稀疏向量

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

API 參考：OpenAIEmbeddings

要將文字編碼為稀疏值，您可以選擇 SPLADE 或 BM25。對於領域外的任務，我們建議使用 BM25。

有關稀疏編碼器的更多資訊，您可以查看 pinecone-text 函式庫文件。

from pinecone_text.sparse import BM25Encoder

# or from pinecone_text.sparse import SpladeEncoder if you wish to work with SPLADE

# use default tf-idf values
bm25_encoder = BM25Encoder().default()

上面的程式碼使用預設的 tfids 值。強烈建議將 tf-idf 值調整為您自己的語料庫。您可以按照以下步驟操作

corpus = ["foo", "bar", "world", "hello"]

# fit tf-idf values on your corpus
bm25_encoder.fit(corpus)

# store the values to a json file
bm25_encoder.dump("bm25_values.json")

# load to your BM25Encoder object
bm25_encoder = BM25Encoder().load("bm25_values.json")

載入檢索器

我們現在可以建構檢索器了！

retriever = PineconeHybridSearchRetriever(
    embeddings=embeddings, sparse_encoder=bm25_encoder, index=index
)

新增文字 (如果需要)

我們可以選擇性地將文字新增到檢索器 (如果它們尚未存在於其中)

retriever.add_texts(["foo", "bar", "world", "hello"])

100%|██████████| 1/1 [00:02<00:00,  2.27s/it]

使用檢索器

我們現在可以使用檢索器了！

result = retriever.invoke("foo")

result[0]

Document(page_content='foo', metadata={})

檢索器概念指南
檢索器操作指南

設定 Pinecone​

取得嵌入和稀疏編碼器​

載入檢索器​

新增文字 (如果需要)​

使用檢索器​

相關連結​

此頁面是否有幫助？

設定 Pinecone

取得嵌入和稀疏編碼器

載入檢索器

新增文字 (如果需要)

使用檢索器

相關連結