Clarifai
Clarifai 是一個 AI 平台,提供完整的 AI 生命週期,範圍涵蓋資料探索、資料標記、模型訓練、評估和推論。 上傳輸入後,Clarifai 應用程式可用作向量資料庫。
本筆記本展示如何使用與 Clarifai
向量資料庫相關的功能。 範例展示了文字語意搜尋功能。 Clarifai 還支援影像、影片幀和本地搜尋的語意搜尋(請參閱 Rank)和屬性搜尋(請參閱 Filter)。
要使用 Clarifai,您必須擁有一個帳戶和一個個人存取權杖 (PAT) 金鑰。 點擊此處以獲取或建立 PAT。
相依性
# Install required dependencies
%pip install --upgrade --quiet clarifai langchain-community
導入
在這裡,我們將設定個人存取權杖。 您可以在平台上的設定/安全性下找到您的 PAT。
# Please login and get your API key from https://clarifai.com/settings/security
from getpass import getpass
CLARIFAI_PAT = getpass()
········
# Import the required modules
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Clarifai
from langchain_text_splitters import CharacterTextSplitter
設定
設定將上傳文字資料的使用者 ID 和應用程式 ID。 注意:在建立該應用程式時,請選擇適當的基礎工作流程來索引您的文字文件,例如語言理解工作流程。
您必須先在 Clarifai 上建立一個帳戶,然後建立一個應用程式。
USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 2
從文字建立
從文字列表建立 Clarifai 向量儲存。 本節會將每個文字及其各自的中繼資料上傳到 Clarifai 應用程式。 然後可以使用 Clarifai 應用程式進行語意搜尋,以尋找相關文字。
texts = [
"I really enjoy spending time with you",
"I hate spending time with my dog",
"I want to go for a run",
"I went to the movies yesterday",
"I love playing soccer with my friends",
]
metadatas = [
{"id": i, "text": text, "source": "book 1", "category": ["books", "modern"]}
for i, text in enumerate(texts)
]
或者,您可以選擇為輸入提供自訂輸入 ID。
idlist = ["text1", "text2", "text3", "text4", "text5"]
metadatas = [
{"id": idlist[i], "text": text, "source": "book 1", "category": ["books", "modern"]}
for i, text in enumerate(texts)
]
# There is an option to initialize clarifai vector store with pat as argument!
clarifai_vector_db = Clarifai(
user_id=USER_ID,
app_id=APP_ID,
number_of_docs=NUMBER_OF_DOCS,
)
將資料上傳到 clarifai 應用程式。
# upload with metadata and custom input ids.
response = clarifai_vector_db.add_texts(texts=texts, ids=idlist, metadatas=metadatas)
# upload without metadata (Not recommended)- Since you will not be able to perform Search operation with respect to metadata.
# custom input_id (optional)
response = clarifai_vector_db.add_texts(texts=texts)
您可以建立一個 clarifai 向量 DB 儲存,並直接將所有輸入擷取到您的應用程式中,
clarifai_vector_db = Clarifai.from_texts(
user_id=USER_ID,
app_id=APP_ID,
texts=texts,
metadatas=metadatas,
)
使用相似性搜尋功能搜尋相似的文字。
docs = clarifai_vector_db.similarity_search("I would like to see you")
docs
[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 'text1', 'source': 'book 1', 'category': ['books', 'modern']})]
此外,您可以依中繼資料篩選您的搜尋結果。
# There is lots powerful filtering you can do within an app by leveraging metadata filters.
# This one will limit the similarity query to only the texts that have key of "source" matching value of "book 1"
book1_similar_docs = clarifai_vector_db.similarity_search(
"I would love to see you", filter={"source": "book 1"}
)
# you can also use lists in the input's metadata and then select things that match an item in the list. This is useful for categories like below:
book_category_similar_docs = clarifai_vector_db.similarity_search(
"I would love to see you", filter={"category": ["books"]}
)
從文件建立
從文件列表建立 Clarifai 向量儲存。 本節會將每個文件及其各自的中繼資料上傳到 Clarifai 應用程式。 然後可以使用 Clarifai 應用程式進行語意搜尋,以尋找相關文件。
loader = TextLoader("your_local_file_path.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 4
建立一個 clarifai 向量 DB 類別,並將所有文件擷取到 clarifai 應用程式中。
clarifai_vector_db = Clarifai.from_documents(
user_id=USER_ID,
app_id=APP_ID,
documents=docs,
number_of_docs=NUMBER_OF_DOCS,
)
docs = clarifai_vector_db.similarity_search("Texts related to population")
docs
從現有應用程式建立
在 Clarifai 內部,我們有很棒的工具,可以透過 API 或 UI 將資料新增至應用程式(本質上是專案)。 大多數使用者在與 LangChain 互動之前已經這樣做了,因此本範例將使用現有應用程式中的資料來執行搜尋。 請查看我們的 API 文件 和 UI 文件。 然後可以使用 Clarifai 應用程式進行語意搜尋,以尋找相關文件。
USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 4
clarifai_vector_db = Clarifai(
user_id=USER_ID,
app_id=APP_ID,
number_of_docs=NUMBER_OF_DOCS,
)
docs = clarifai_vector_db.similarity_search(
"Texts related to ammuniction and president wilson"
)
docs[0].page_content
"President Wilson, generally acclaimed as the leader of the world's democracies,\nphrased for civilization the arguments against autocracy in the great peace conference\nafter the war. The President headed the American delegation to that conclave of world\nre-construction. With him as delegates to the conference were Robert Lansing, Secretary\nof State; Henry White, former Ambassador to France and Italy; Edward M. House and\nGeneral Tasker H. Bliss.\nRepresenting American Labor at the International Labor conference held in Paris\nsimultaneously with the Peace Conference were Samuel Gompers, president of the\nAmerican Federation of Labor; William Green, secretary-treasurer of the United Mine\nWorkers of America; John R. Alpine, president of the Plumbers' Union; James Duncan,\npresident of the International Association of Granite Cutters; Frank Duffy, president of\nthe United Brotherhood of Carpenters and Joiners, and Frank Morrison, secretary of the\nAmerican Federation of Labor.\nEstimating the share of each Allied nation in the great victory, mankind will\nconclude that the heaviest cost in proportion to prewar population and treasure was paid\nby the nations that first felt the shock of war, Belgium, Serbia, Poland and France. All\nfour were the battle-grounds of huge armies, oscillating in a bloody frenzy over once\nfertile fields and once prosperous towns.\nBelgium, with a population of 8,000,000, had a casualty list of more than 350,000;\nFrance, with its casualties of 4,000,000 out of a population (including its colonies) of\n90,000,000, is really the martyr nation of the world. Her gallant poilus showed the world\nhow cheerfully men may die in defense of home and liberty. Huge Russia, including\nhapless Poland, had a casualty list of 7,000,000 out of its entire population of\n180,000,000. The United States out of a population of 110,000,000 had a casualty list of\n236,117 for nineteen months of war; of these 53,169 were killed or died of disease;\n179,625 were wounded; and 3,323 prisoners or missing."