跳至主要內容

Neo4j 向量索引

Neo4j 是一個開放原始碼圖形資料庫,具有向量相似度搜尋的整合支援

它支援

  • 近似最近鄰搜尋
  • 歐幾里德相似度和餘弦相似度
  • 結合向量和關鍵字搜尋的混合搜尋

本筆記展示如何使用 Neo4j 向量索引 (Neo4jVector)。

請參閱安裝說明

# Pip install necessary package
%pip install --upgrade --quiet neo4j
%pip install --upgrade --quiet langchain-openai langchain-neo4j
%pip install --upgrade --quiet tiktoken

我們要使用 OpenAIEmbeddings,因此必須取得 OpenAI API 金鑰。

import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
loader = TextLoader("../../how_to/state_of_the_union.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
# Neo4jVector requires the Neo4j database credentials

url = "bolt://127.0.0.1:7687"
username = "neo4j"
password = "password"

# You can also use environment variables instead of directly passing named parameters
# os.environ["NEO4J_URI"] = "bolt://127.0.0.1:7687"
# os.environ["NEO4J_USERNAME"] = "neo4j"
# os.environ["NEO4J_PASSWORD"] = "pleaseletmein"

使用餘弦距離的相似度搜尋 (預設)

# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.

db = Neo4jVector.from_documents(
docs, OpenAIEmbeddings(), url=url, username=username, password=password
)
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
--------------------------------------------------------------------------------
Score: 0.9076391458511353
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score: 0.8912242650985718
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
--------------------------------------------------------------------------------

使用 vectorstore

上面,我們從頭開始建立了一個 vectorstore。然而,通常我們想要使用現有的 vectorstore。為了做到這一點,我們可以直接初始化它。

index_name = "vector"  # default index name

store = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name=index_name,
)

我們也可以使用 from_existing_graph 方法從現有的圖形初始化一個 vectorstore。此方法從資料庫提取相關的文字資訊,並計算文字嵌入並將其儲存回資料庫。

# First we create sample data in graph
store.query(
"CREATE (p:Person {name: 'Tomaz', location:'Slovenia', hobby:'Bicycle', age: 33})"
)
[]
# Now we initialize from existing graph
existing_graph = Neo4jVector.from_existing_graph(
embedding=OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
node_label="Person",
text_node_properties=["name", "location"],
embedding_node_property="embedding",
)
result = existing_graph.similarity_search("Slovenia", k=1)
result[0]
Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})

Neo4j 還支援關係向量索引,其中嵌入儲存為關係屬性並進行索引。關係向量索引無法透過 LangChain 填充,但您可以將其連接到現有的關係向量索引。

# First we create sample data and index in graph
store.query(
"MERGE (p:Person {name: 'Tomaz'}) "
"MERGE (p1:Person {name:'Leann'}) "
"MERGE (p1)-[:FRIEND {text:'example text', embedding:$embedding}]->(p2)",
params={"embedding": OpenAIEmbeddings().embed_query("example text")},
)
# Create a vector index
relationship_index = "relationship_vector"
store.query(
"""
CREATE VECTOR INDEX $relationship_index
IF NOT EXISTS
FOR ()-[r:FRIEND]-() ON (r.embedding)
OPTIONS {indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}
""",
params={"relationship_index": relationship_index},
)
[]
relationship_vector = Neo4jVector.from_existing_relationship_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name=relationship_index,
text_node_property="text",
)
relationship_vector.similarity_search("Example")
[Document(page_content='example text')]

Metadata 篩選

Neo4j 向量儲存也支援通過結合平行運行時和精確最近鄰搜尋的 Metadata 篩選。需要 Neo4j 5.18 或更高版本。

等值篩選具有以下語法。

existing_graph.similarity_search(
"Slovenia",
filter={"hobby": "Bicycle", "name": "Tomaz"},
)
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]

Metadata 篩選也支援以下運算符

  • $eq: 等於
  • $ne: 不等於
  • $lt: 小於
  • $lte: 小於或等於
  • $gt: 大於
  • $gte: 大於或等於
  • $in: 在值列表中
  • $nin: 不在值列表中
  • $between: 介於兩個值之間
  • $like: 文字包含值
  • $ilike: 小寫文字包含值
existing_graph.similarity_search(
"Slovenia",
filter={"hobby": {"$eq": "Bicycle"}, "age": {"$gt": 15}},
)
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]

您也可以在篩選器之間使用 OR 運算符

existing_graph.similarity_search(
"Slovenia",
filter={"$or": [{"hobby": {"$eq": "Bicycle"}}, {"age": {"$gt": 15}}]},
)
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]

新增文件

我們可以將文件新增到現有的 vectorstore。

store.add_documents([Document(page_content="foo")])
['acbd18db4cc2f85cedef654fccc4a4d8']
docs_with_score = store.similarity_search_with_score("foo")
docs_with_score[0]
(Document(page_content='foo'), 0.9999997615814209)

使用檢索查詢自訂響應

您還可以通過使用自定義 Cypher 片段來自定義響應,該片段可以從圖形中獲取其他資訊。在底層,最終的 Cypher 語句是這樣構建的

read_query = (
"CALL db.index.vector.queryNodes($index, $k, $embedding) "
"YIELD node, score "
) + retrieval_query

檢索查詢必須返回以下三個欄位

  • text: Union[str, Dict] = 用於填充文檔的 page_content 的值
  • score: Float = 相似度分數
  • metadata: Dict = 文檔的附加 Metadata

在此部落格文章中了解更多資訊。

retrieval_query = """
RETURN "Name:" + node.name AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1)
[Document(page_content='Name:Tomaz', metadata={'foo': 'bar'})]

這是一個將除了 embedding 之外的所有節點屬性作為字典傳遞到 text 欄位的範例,

retrieval_query = """
RETURN node {.name, .age, .hobby} AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1)
[Document(page_content='name: Tomaz\nage: 33\nhobby: Bicycle\n', metadata={'foo': 'bar'})]

您也可以將 Cypher 參數傳遞給檢索查詢。參數可用於額外的篩選、遍歷等...

retrieval_query = """
RETURN node {.*, embedding:Null, extra: $extra} AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name="person_index",
retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1, params={"extra": "ParamInfo"})
[Document(page_content='location: Slovenia\nextra: ParamInfo\nname: Tomaz\nage: 33\nhobby: Bicycle\nembedding: None\n', metadata={'foo': 'bar'})]

混合搜尋 (向量 + 關鍵字)

Neo4j 整合了向量和關鍵字索引,這使您可以使用混合搜尋方法

# The Neo4jVector Module will connect to Neo4j and create a vector and keyword indices if needed.
hybrid_db = Neo4jVector.from_documents(
docs,
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
search_type="hybrid",
)

要從現有的索引載入混合搜尋,您必須同時提供向量和關鍵字索引

index_name = "vector"  # default index name
keyword_index_name = "keyword" # default keyword index name

store = Neo4jVector.from_existing_index(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
index_name=index_name,
keyword_index_name=keyword_index_name,
search_type="hybrid",
)

Retriever 選項

本節展示如何使用 Neo4jVector 作為 retriever。

retriever = store.as_retriever()
retriever.invoke(query)[0]
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../how_to/state_of_the_union.txt'})

具備來源的問答系統

本節將介紹如何透過 Index 執行具備來源的問答。這透過使用 RetrievalQAWithSourcesChain 達成,它會從 Index 中查找文件。

from langchain.chains import RetrievalQAWithSourcesChain
from langchain_openai import ChatOpenAI
chain = RetrievalQAWithSourcesChain.from_chain_type(
ChatOpenAI(temperature=0), chain_type="stuff", retriever=retriever
)
chain.invoke(
{"question": "What did the president say about Justice Breyer"},
return_only_outputs=True,
)
{'answer': 'The president honored Justice Stephen Breyer for his service to the country and mentioned his retirement from the United States Supreme Court.\n',
'sources': '../../how_to/state_of_the_union.txt'}

此頁面是否有幫助?