NanoPQ (Product Quantization)

Product Quantization 演算法 (k-NN) 簡介是一種量化演算法，有助於壓縮資料庫向量，這在涉及大型資料集時有助於語義搜尋。簡而言之，嵌入會被分成 M 個子空間，然後進一步進行叢集分析。在叢集向量後，質心向量會映射到每個子空間叢集中存在的向量。

本筆記本說明如何使用在底層使用 Product Quantization 的檢索器，該檢索器已由 nanopq 套件實作。

%pip install -qU langchain-community langchain-openai nanopq

from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain_community.retrievers import NanoPQRetriever

API 參考：SpacyEmbeddings | NanoPQRetriever

使用文本建立新的檢索器

retriever = NanoPQRetriever.from_texts(
    ["Great world", "great words", "world", "planets of the world"],
    SpacyEmbeddings(model_name="en_core_web_sm"),
    clusters=2,
    subspace=2,
)

使用檢索器

我們現在可以使用檢索器了！

retriever.invoke("earth")

M: 2, Ks: 2, metric : <class 'numpy.uint8'>, code_dtype: l2
iter: 20, seed: 123
Training the subspace: 0 / 2
Training the subspace: 1 / 2
Encoding the subspace: 0 / 2
Encoding the subspace: 1 / 2

[Document(page_content='world'),
 Document(page_content='Great world'),
 Document(page_content='great words'),
 Document(page_content='planets of the world')]

檢索器概念指南
檢索器操作指南

使用文本建立新的檢索器​

使用檢索器​

相關內容​

此頁面是否有幫助？

使用文本建立新的檢索器

使用檢索器

相關內容