跳至主要內容

阿里巴巴雲端 OpenSearch

阿里巴巴雲端 OpenSearch 是一個開發智慧搜尋服務的一站式平台。OpenSearch 建構於 Alibaba 開發的大規模分散式搜尋引擎之上。 OpenSearch 為阿里巴巴集團的 500 多個商業案例和數千個阿里巴巴雲端客戶提供服務。 OpenSearch 協助在不同的搜尋場景中開發搜尋服務,包括電子商務、O2O、多媒體、內容產業、社群和論壇以及企業中的大數據查詢。

OpenSearch 協助您開發高品質、免維護且高效能的智慧搜尋服務,為您的使用者提供高搜尋效率和準確性。

OpenSearch 提供向量搜尋功能。在特定情境中,特別是測驗問題搜尋和圖像搜尋情境,您可以將向量搜尋功能與多模態搜尋功能一起使用,以提高搜尋結果的準確性。

本筆記本展示了如何使用與 Alibaba Cloud OpenSearch Vector Search Edition 相關的功能。

設定

購買執行個體並進行設定

阿里巴巴雲端 購買 OpenSearch Vector Search Edition,並根據 文件設定執行個體。

要執行,您應該有一個 OpenSearch Vector Search Edition 執行個體正在運作。

阿里巴巴雲端 OpenSearch 向量儲存區類別

AlibabaCloudOpenSearch 類別支援以下功能:

  • add_texts
  • add_documents
  • from_texts
  • from_documents
  • similarity_search
  • asimilarity_search
  • similarity_search_by_vector
  • asimilarity_search_by_vector
  • similarity_search_with_relevance_scores
  • delete_doc_by_texts

閱讀說明文件,以快速熟悉並設定 OpenSearch Vector Search Edition 執行個體。

如果您在使用過程中遇到任何問題,請隨時透過 xingshaomin.xsm@alibaba-inc.com 聯絡,我們將盡力為您提供協助和支援。

在執行個體啟動並執行後,請按照下列步驟分割文件、取得嵌入、連接至阿里巴巴雲端 OpenSearch 執行個體、索引文件並執行向量擷取。

我們需要先安裝下列 Python 套件。

%pip install --upgrade --quiet  langchain-community alibabacloud_ha3engine_vector

我們要使用 OpenAIEmbeddings,因此我們必須取得 OpenAI API 金鑰。

import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

範例

from langchain_community.vectorstores import (
AlibabaCloudOpenSearch,
AlibabaCloudOpenSearchSettings,
)
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

分割文件並取得嵌入。

from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
API 參考:TextLoader

建立 opensearch 設定。

settings = AlibabaCloudOpenSearchSettings(
endpoint=" The endpoint of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
instance_id="The identify of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
protocol="Communication Protocol between SDK and Server, default is http.",
username="The username specified when purchasing the instance.",
password="The password specified when purchasing the instance.",
namespace="The instance data will be partitioned based on the namespace field. If the namespace is enabled, you need to specify the namespace field name during initialization. Otherwise, the queries cannot be executed correctly.",
tablename="The table name specified during instance configuration.",
embedding_field_separator="Delimiter specified for writing vector field data, default is comma.",
output_fields="Specify the field list returned when invoking OpenSearch, by default it is the value list of the field mapping field.",
field_name_mapping={
"id": "id", # The id field name mapping of index document.
"document": "document", # The text field name mapping of index document.
"embedding": "embedding", # The embedding field name mapping of index document.
"name_of_the_metadata_specified_during_search": "opensearch_metadata_field_name,=",
# The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query,
# Currently supported logical operators are: > (greater than), < (less than), = (equal to), <= (less than or equal to), >= (greater than or equal to), != (not equal to).
# Refer to this link: https://help.aliyun.com/zh/open-search/vector-search-edition/filter-expression
},
)

# for example

# settings = AlibabaCloudOpenSearchSettings(
# endpoint='ha-cn-5yd3fhdm102.public.ha.aliyuncs.com',
# instance_id='ha-cn-5yd3fhdm102',
# username='instance user name',
# password='instance password',
# table_name='test_table',
# field_name_mapping={
# "id": "id",
# "document": "document",
# "embedding": "embedding",
# "string_field": "string_filed,=",
# "int_field": "int_filed,=",
# "float_field": "float_field,=",
# "double_field": "double_field,="
#
# },
# )

透過設定建立 opensearch 存取執行個體。

# Create an opensearch instance and index docs.
opensearch = AlibabaCloudOpenSearch.from_texts(
texts=docs, embedding=embeddings, config=settings
)

# Create an opensearch instance.
opensearch = AlibabaCloudOpenSearch(embedding=embeddings, config=settings)

新增文字並建立索引。

metadatas = [
{"string_field": "value1", "int_field": 1, "float_field": 1.0, "double_field": 2.0},
{"string_field": "value2", "int_field": 2, "float_field": 3.0, "double_field": 4.0},
{"string_field": "value3", "int_field": 3, "float_field": 5.0, "double_field": 6.0},
]
# the key of metadatas must match field_name_mapping in settings.
opensearch.add_texts(texts=docs, ids=[], metadatas=metadatas)

查詢和檢索資料。

query = "What did the president say about Ketanji Brown Jackson"
docs = opensearch.similarity_search(query)
print(docs[0].page_content)

查詢和檢索帶有中繼資料的資料。

query = "What did the president say about Ketanji Brown Jackson"
metadata = {
"string_field": "value1",
"int_field": 1,
"float_field": 1.0,
"double_field": 2.0,
}
docs = opensearch.similarity_search(query, filter=metadata)
print(docs[0].page_content)

如果您在使用過程中遇到任何問題,請隨時透過 xingshaomin.xsm@alibaba-inc.com 聯絡,我們將盡力為您提供協助和支援。


這個頁面有幫助嗎?