跳到主要內容
Open In ColabOpen on GitHub

阿里巴巴雲 OpenSearch

阿里巴巴雲 OpenSearch 是一站式平台,用於開發智慧搜尋服務。OpenSearch 建構於 阿里巴巴 開發的大規模分散式搜尋引擎之上。OpenSearch 為阿里巴巴集團超過 500 個業務案例和數千家阿里巴巴雲客戶提供服務。OpenSearch 協助在不同的搜尋場景中開發搜尋服務,包括電子商務、O2O、多媒體、內容產業、社群和論壇,以及企業中的大數據查詢。

OpenSearch 協助您開發高品質、免維護和高效能的智慧搜尋服務,為您的使用者提供高搜尋效率和準確性。

OpenSearch 提供向量搜尋功能。在特定場景中,尤其是在測驗題搜尋和圖像搜尋場景中,您可以將向量搜尋功能與多模態搜尋功能結合使用,以提高搜尋結果的準確性。

本筆記本展示如何使用與 阿里巴巴雲 OpenSearch 向量搜尋版 相關的功能。

設定

購買並設定執行個體

阿里巴巴雲 購買 OpenSearch 向量搜尋版,並根據 文件 中的說明設定執行個體。

若要執行,您應已啟動並執行 OpenSearch 向量搜尋版 執行個體。

阿里巴巴雲 OpenSearch 向量儲存庫類別

AlibabaCloudOpenSearch 類別支援以下功能

  • add_texts
  • add_documents
  • from_texts
  • from_documents
  • similarity_search
  • asimilarity_search
  • similarity_search_by_vector
  • asimilarity_search_by_vector
  • similarity_search_with_relevance_scores
  • delete_doc_by_texts

閱讀說明文件,以快速熟悉並設定 OpenSearch 向量搜尋版執行個體。

如果您在使用過程中遇到任何問題,請隨時透過電子郵件聯絡 xingshaomin.xsm@alibaba-inc.com,我們將盡力為您提供協助與支援。

執行個體啟動並執行後,請依照下列步驟分割文件、取得嵌入、連線至阿里巴巴雲 opensearch 執行個體、建立文件索引,以及執行向量檢索。

我們需要先安裝下列 Python 套件。

%pip install --upgrade --quiet  langchain-community alibabacloud_ha3engine_vector

我們想要使用 OpenAIEmbeddings,因此我們必須取得 OpenAI API 金鑰。

import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

範例

from langchain_community.vectorstores import (
AlibabaCloudOpenSearch,
AlibabaCloudOpenSearchSettings,
)
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

分割文件並取得嵌入。

from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
API 參考文件:TextLoader

建立 opensearch 設定。

settings = AlibabaCloudOpenSearchSettings(
endpoint=" The endpoint of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
instance_id="The identify of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
protocol="Communication Protocol between SDK and Server, default is http.",
username="The username specified when purchasing the instance.",
password="The password specified when purchasing the instance.",
namespace="The instance data will be partitioned based on the namespace field. If the namespace is enabled, you need to specify the namespace field name during initialization. Otherwise, the queries cannot be executed correctly.",
tablename="The table name specified during instance configuration.",
embedding_field_separator="Delimiter specified for writing vector field data, default is comma.",
output_fields="Specify the field list returned when invoking OpenSearch, by default it is the value list of the field mapping field.",
field_name_mapping={
"id": "id", # The id field name mapping of index document.
"document": "document", # The text field name mapping of index document.
"embedding": "embedding", # The embedding field name mapping of index document.
"name_of_the_metadata_specified_during_search": "opensearch_metadata_field_name,=",
# The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query,
# Currently supported logical operators are: > (greater than), < (less than), = (equal to), <= (less than or equal to), >= (greater than or equal to), != (not equal to).
# Refer to this link: https://help.aliyun.com/zh/open-search/vector-search-edition/filter-expression
},
)

# for example

# settings = AlibabaCloudOpenSearchSettings(
# endpoint='ha-cn-5yd3fhdm102.public.ha.aliyuncs.com',
# instance_id='ha-cn-5yd3fhdm102',
# username='instance user name',
# password='instance password',
# table_name='test_table',
# field_name_mapping={
# "id": "id",
# "document": "document",
# "embedding": "embedding",
# "string_field": "string_filed,=",
# "int_field": "int_filed,=",
# "float_field": "float_field,=",
# "double_field": "double_field,="
#
# },
# )

透過設定建立 opensearch 存取執行個體。

# Create an opensearch instance and index docs.
opensearch = AlibabaCloudOpenSearch.from_texts(
texts=docs, embedding=embeddings, config=settings
)

# Create an opensearch instance.
opensearch = AlibabaCloudOpenSearch(embedding=embeddings, config=settings)

新增文字並建立索引。

metadatas = [
{"string_field": "value1", "int_field": 1, "float_field": 1.0, "double_field": 2.0},
{"string_field": "value2", "int_field": 2, "float_field": 3.0, "double_field": 4.0},
{"string_field": "value3", "int_field": 3, "float_field": 5.0, "double_field": 6.0},
]
# the key of metadatas must match field_name_mapping in settings.
opensearch.add_texts(texts=docs, ids=[], metadatas=metadatas)

查詢並檢索資料。

query = "What did the president say about Ketanji Brown Jackson"
docs = opensearch.similarity_search(query)
print(docs[0].page_content)

查詢並檢索包含中繼資料的資料。

query = "What did the president say about Ketanji Brown Jackson"
metadata = {
"string_field": "value1",
"int_field": 1,
"float_field": 1.0,
"double_field": 2.0,
}
docs = opensearch.similarity_search(query, filter=metadata)
print(docs[0].page_content)

如果您在使用過程中遇到任何問題,請隨時透過電子郵件聯絡 xingshaomin.xsm@alibaba-inc.com,我們將盡力為您提供協助與支援。


此頁面是否對您有幫助?