Azure Cosmos DB (適用於 Apache Gremlin)
Azure Cosmos DB for Apache Gremlin 是一項圖形資料庫服務,可用於儲存包含數十億個頂點和邊線的大型圖形。您可以使用毫秒級延遲查詢圖形,並輕鬆演化圖形結構。
Gremlin 是一種圖形遍歷語言和虛擬機器,由
Apache Software Foundation
的Apache TinkerPop
開發。
本筆記本展示如何使用 LLM 為圖形資料庫提供自然語言介面,您可以使用 Gremlin
查詢語言查詢該資料庫。
設定
安裝程式庫
!pip3 install gremlinpython
您將需要 Azure CosmosDB 圖形資料庫執行個體。一個選項是在 Azure 中建立免費的 CosmosDB 圖形資料庫執行個體。
當您建立 Cosmos DB 帳戶和圖形時,請使用 /type
作為分割區金鑰。
cosmosdb_name = "mycosmosdb"
cosmosdb_db_id = "graphtesting"
cosmosdb_db_graph_id = "mygraph"
cosmosdb_access_Key = "longstring=="
import nest_asyncio
from langchain_community.chains.graph_qa.gremlin import GremlinQAChain
from langchain_community.graphs import GremlinGraph
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_openai import AzureChatOpenAI
API 參考文件:GremlinQAChain | GremlinGraph | GraphDocument | Node | Relationship | Document | AzureChatOpenAI
graph = GremlinGraph(
url=f"wss://{cosmosdb_name}.gremlin.cosmos.azure.com:443/",
username=f"/dbs/{cosmosdb_db_id}/colls/{cosmosdb_db_graph_id}",
password=cosmosdb_access_Key,
)
為資料庫植入資料
假設您的資料庫是空的,您可以使用 GraphDocuments 填入資料
對於 Gremlin,請務必為每個節點新增名為 'label' 的屬性。如果未設定標籤,則 Node.type 會用作標籤。對於使用自然 ID 的 Cosmos 而言,這很有意義,因為它們在圖形瀏覽器中是可見的。
source_doc = Document(
page_content="Matrix is a movie where Keanu Reeves, Laurence Fishburne and Carrie-Anne Moss acted."
)
movie = Node(id="The Matrix", properties={"label": "movie", "title": "The Matrix"})
actor1 = Node(id="Keanu Reeves", properties={"label": "actor", "name": "Keanu Reeves"})
actor2 = Node(
id="Laurence Fishburne", properties={"label": "actor", "name": "Laurence Fishburne"}
)
actor3 = Node(
id="Carrie-Anne Moss", properties={"label": "actor", "name": "Carrie-Anne Moss"}
)
rel1 = Relationship(
id=5, type="ActedIn", source=actor1, target=movie, properties={"label": "ActedIn"}
)
rel2 = Relationship(
id=6, type="ActedIn", source=actor2, target=movie, properties={"label": "ActedIn"}
)
rel3 = Relationship(
id=7, type="ActedIn", source=actor3, target=movie, properties={"label": "ActedIn"}
)
rel4 = Relationship(
id=8,
type="Starring",
source=movie,
target=actor1,
properties={"label": "Strarring"},
)
rel5 = Relationship(
id=9,
type="Starring",
source=movie,
target=actor2,
properties={"label": "Strarring"},
)
rel6 = Relationship(
id=10,
type="Straring",
source=movie,
target=actor3,
properties={"label": "Strarring"},
)
graph_doc = GraphDocument(
nodes=[movie, actor1, actor2, actor3],
relationships=[rel1, rel2, rel3, rel4, rel5, rel6],
source=source_doc,
)
# The underlying python-gremlin has a problem when running in notebook
# The following line is a workaround to fix the problem
nest_asyncio.apply()
# Add the document to the CosmosDB graph.
graph.add_graph_documents([graph_doc])
重新整理圖形結構描述資訊
如果資料庫的結構描述變更(更新後),您可以重新整理結構描述資訊。
graph.refresh_schema()
print(graph.schema)
查詢圖形
我們現在可以使用 gremlin QA 鏈來詢問圖形問題
chain = GremlinQAChain.from_llm(
AzureChatOpenAI(
temperature=0,
azure_deployment="gpt-4-turbo",
),
graph=graph,
verbose=True,
)
chain.invoke("Who played in The Matrix?")
chain.run("How many people played in The Matrix?")