Activeloop Deep Memory

Activeloop Deep Memory 是一套工具，可讓您針對您的使用案例最佳化向量儲存庫，並在 LLM 應用程式中實現更高的準確性。

檢索增強生成 (RAG) 最近受到廣泛關注。隨著進階 RAG 技術和代理程式的出現，它們擴展了 RAG 可以完成的潛力。然而，一些挑戰可能會限制 RAG 整合到生產環境中。在生產環境中實作 RAG 時，主要考量的因素是準確性（召回率）、成本和延遲。對於基本的使用案例，OpenAI 的 Ada 模型與簡單的相似性搜尋相結合，可以產生令人滿意的結果。然而，為了在搜尋期間獲得更高的準確性或召回率，可能需要採用進階的檢索技術。這些方法可能涉及改變資料區塊大小、多次重寫查詢等等，這可能會增加延遲和成本。Activeloop 的 Deep Memory 是 Activeloop Deep Lake 使用者可使用的功能，透過引入一個微小的神經網路層來解決這些問題，該網路層經過訓練，可將使用者查詢與語料庫中的相關資料進行匹配。雖然這種新增功能在搜尋期間只會產生極小的延遲，但它可以將檢索準確性提高多達 27%，並且仍然具有成本效益且易於使用，無需任何額外的進階 RAG 技術。

在本教學中，我們將剖析 DeepLake 文件，並建立一個 RAG 系統，可以回答文件中的問題。

1. 資料集建立

在本教學中，我們將使用 BeautifulSoup 函式庫和 LangChain 的文件剖析器（如 Html2TextTransformer、AsyncHtmlLoader）來剖析 activeloop 的文件。因此，我們需要安裝以下函式庫

%pip install --upgrade --quiet  tiktoken langchain-openai python-dotenv datasets langchain deeplake beautifulsoup4 html2text ragas

此外，您還需要建立一個 Activeloop 帳戶。

ORG_ID = "..."

from langchain.chains import RetrievalQA
from langchain_community.vectorstores import DeepLake
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

API 參考：RetrievalQA | DeepLake | ChatOpenAI | OpenAIEmbeddings

import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API token: ")
# # activeloop token is needed if you are not signed in using CLI: `activeloop login -u <USERNAME> -p <PASSWORD>`
if "ACTIVELOOP_TOKEN" not in os.environ:
    os.environ["ACTIVELOOP_TOKEN"] = getpass.getpass(
        "Enter your ActiveLoop API token: "
    )  # Get your API token from https://app.activeloop.ai, click on your profile picture in the top right corner, and select "API Tokens"

token = os.getenv("ACTIVELOOP_TOKEN")
openai_embeddings = OpenAIEmbeddings()

db = DeepLake(
    dataset_path=f"hub://{ORG_ID}/deeplake-docs-deepmemory",  # org_id stands for your username or organization from activeloop
    embedding=openai_embeddings,
    runtime={"tensor_db": True},
    token=token,
    # overwrite=True, # user overwrite flag if you want to overwrite the full dataset
    read_only=False,
)

使用 BeautifulSoup 剖析網頁中的所有連結

from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup


def get_all_links(url):
    response = requests.get(url)
    if response.status_code != 200:
        print(f"Failed to retrieve the page: {url}")
        return []

    soup = BeautifulSoup(response.content, "html.parser")

    # Finding all 'a' tags which typically contain href attribute for links
    links = [
        urljoin(url, a["href"]) for a in soup.find_all("a", href=True) if a["href"]
    ]

    return links


base_url = "https://docs.deeplake.ai/en/latest/"
all_links = get_all_links(base_url)

載入資料

from langchain_community.document_loaders.async_html import AsyncHtmlLoader

loader = AsyncHtmlLoader(all_links)
docs = loader.load()

API 參考：AsyncHtmlLoader

將資料轉換為使用者可讀取的格式

from langchain_community.document_transformers import Html2TextTransformer

html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(docs)

API 參考：Html2TextTransformer

現在，讓我們進一步將文件分塊，因為有些文件包含過多文字

from langchain_text_splitters import RecursiveCharacterTextSplitter

chunk_size = 4096
docs_new = []

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
)

for doc in docs_transformed:
    if len(doc.page_content) < chunk_size:
        docs_new.append(doc)
    else:
        docs = text_splitter.create_documents([doc.page_content])
        docs_new.extend(docs)

API 參考：RecursiveCharacterTextSplitter

填充向量儲存庫

docs = db.add_documents(docs_new)

2. 產生合成查詢並訓練 Deep Memory

下一步是訓練 deep_memory 模型，該模型將使用者查詢與您已有的資料集對齊。如果您還沒有任何使用者查詢，請不用擔心，我們將使用 LLM 產生它們！

待辦事項：新增圖片

上面我們展示了 deep_memory 的整體架構運作方式。如您所見，為了訓練它，您需要關聯性、查詢以及語料庫資料（我們要查詢的資料）。語料庫資料已在上一節中填充，這裡我們將產生問題和關聯性。

questions - 是字串文字，其中每個字串代表一個查詢
relevance - 包含每個問題的真實來源連結。可能有多個文件包含給定問題的答案。因此，關聯性是 List[List[tuple[str, float]]]，其中外部列表表示查詢，內部列表表示相關文件。Tuple 包含 str、float 對，其中字串表示來源文件的 ID（對應於資料集中的 id 張量），而 float 對應於目前文件與問題的關聯程度。

現在，讓我們產生合成問題和關聯性

from typing import List

from langchain.chains.openai_functions import (
    create_structured_output_chain,
)
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

# fetch dataset docs and ids if they exist (optional you can also ingest)
docs = db.vectorstore.dataset.text.data(fetch_chunks=True, aslist=True)["value"]
ids = db.vectorstore.dataset.id.data(fetch_chunks=True, aslist=True)["value"]

# If we pass in a model explicitly, we need to make sure it supports the OpenAI function-calling API.
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)


class Questions(BaseModel):
    """Identifying information about a person."""

    question: str = Field(..., description="Questions about text")


prompt_msgs = [
    SystemMessage(
        content="You are a world class expert for generating questions based on provided context. \
                You make sure the question can be answered by the text."
    ),
    HumanMessagePromptTemplate.from_template(
        "Use the given text to generate a question from the following input: {input}"
    ),
    HumanMessage(content="Tips: Make sure to answer in the correct format"),
]
prompt = ChatPromptTemplate(messages=prompt_msgs)
chain = create_structured_output_chain(Questions, llm, prompt, verbose=True)

text = "# Understanding Hallucinations and Bias ## **Introduction** In this lesson, we'll cover the concept of **hallucinations** in LLMs, highlighting their influence on AI applications and demonstrating how to mitigate them using techniques like the retriever's architectures. We'll also explore **bias** within LLMs with examples."
questions = chain.run(input=text)
print(questions)

import random

from langchain_openai import OpenAIEmbeddings
from tqdm import tqdm


def generate_queries(docs: List[str], ids: List[str], n: int = 100):
    questions = []
    relevances = []
    pbar = tqdm(total=n)
    while len(questions) < n:
        # 1. randomly draw a piece of text and relevance id
        r = random.randint(0, len(docs) - 1)
        text, label = docs[r], ids[r]

        # 2. generate queries and assign and relevance id
        generated_qs = [chain.run(input=text).question]
        questions.extend(generated_qs)
        relevances.extend([[(label, 1)] for _ in generated_qs])
        pbar.update(len(generated_qs))
        if len(questions) % 10 == 0:
            print(f"q: {len(questions)}")
    return questions[:n], relevances[:n]


chain = create_structured_output_chain(Questions, llm, prompt, verbose=False)
questions, relevances = generate_queries(docs, ids, n=200)

train_questions, train_relevances = questions[:100], relevances[:100]
test_questions, test_relevances = questions[100:], relevances[100:]

API 參考：OpenAIEmbeddings

現在我們建立了 100 個訓練查詢以及 100 個測試查詢。現在讓我們訓練 deep_memory

job_id = db.vectorstore.deep_memory.train(
    queries=train_questions,
    relevance=train_relevances,
)

讓我們追蹤訓練進度

db.vectorstore.deep_memory.status("6538939ca0b69a9ca45c528c")

--------------------------------------------------------------
|                  6538e02ecda4691033a51c5b                  |
--------------------------------------------------------------
| status                     | completed                     |
--------------------------------------------------------------
| progress                   | eta: 1.4 seconds              |
|                            | recall@10: 79.00% (+34.00%)   |
--------------------------------------------------------------
| results                    | recall@10: 79.00% (+34.00%)   |
--------------------------------------------------------------

3. 評估 Deep Memory 效能

太棒了，我們已經訓練了模型！它顯示召回率有顯著提高，但是我們現在如何使用它並在新資料上進行評估呢？在本節中，我們將深入探討模型評估和推論部分，並了解如何在 LangChain 中使用它來提高檢索準確性

3.1 Deep Memory 評估

首先，我們可以利用 deep_memory 的內建評估方法。它會計算多個 recall 指標。這可以透過幾行程式碼輕鬆完成。

recall = db.vectorstore.deep_memory.evaluate(
    queries=test_questions,
    relevance=test_relevances,
)

Embedding queries took 0.81 seconds
---- Evaluating without model ---- 
Recall@1:	  9.0%
Recall@3:	  19.0%
Recall@5:	  24.0%
Recall@10:	  42.0%
Recall@50:	  93.0%
Recall@100:	  98.0%
---- Evaluating with model ---- 
Recall@1:	  19.0%
Recall@3:	  42.0%
Recall@5:	  49.0%
Recall@10:	  69.0%
Recall@50:	  97.0%
Recall@100:	  97.0%

它也顯示在未見過的測試資料集上也有相當大的改進！！！

3.2 Deep Memory + RAGas

from ragas.langchain import RagasEvaluatorChain
from ragas.metrics import (
    context_recall,
)

讓我們將召回率轉換為真實來源

def convert_relevance_to_ground_truth(docs, relevance):
    ground_truths = []

    for rel in relevance:
        ground_truth = []
        for doc_id, _ in rel:
            ground_truth.append(docs[doc_id])
        ground_truths.append(ground_truth)
    return ground_truths

ground_truths = convert_relevance_to_ground_truth(docs, test_relevances)

for deep_memory in [False, True]:
    print("\nEvaluating with deep_memory =", deep_memory)
    print("===================================")

    retriever = db.as_retriever()
    retriever.search_kwargs["deep_memory"] = deep_memory

    qa_chain = RetrievalQA.from_chain_type(
        llm=ChatOpenAI(model="gpt-3.5-turbo"),
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
    )

    metrics = {
        "context_recall_score": 0,
    }

    eval_chains = {m.name: RagasEvaluatorChain(metric=m) for m in [context_recall]}

    for question, ground_truth in zip(test_questions, ground_truths):
        result = qa_chain({"query": question})
        result["ground_truths"] = ground_truth
        for name, eval_chain in eval_chains.items():
            score_name = f"{name}_score"
            metrics[score_name] += eval_chain(result)[score_name]

    for metric in metrics:
        metrics[metric] /= len(test_questions)
        print(f"{metric}: {metrics[metric]}")
    print("===================================")

Evaluating with deep_memory = False
===================================
context_recall_score = 0.3763423145
===================================

Evaluating with deep_memory = True
===================================
context_recall_score = 0.5634545323
===================================

3.3 Deep Memory 推論

待辦事項：新增圖片

使用 deep_memory

retriever = db.as_retriever()
retriever.search_kwargs["deep_memory"] = True
retriever.search_kwargs["k"] = 10

query = "Deamination of cytidine to uridine on the minus strand of viral DNA results in catastrophic G-to-A mutations in the viral genome."
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4"), chain_type="stuff", retriever=retriever
)
print(qa.run(query))

The base htype of the 'video_seq' tensor is 'video'.

不使用 deep_memory

retriever = db.as_retriever()
retriever.search_kwargs["deep_memory"] = False
retriever.search_kwargs["k"] = 10

query = "Deamination of cytidine to uridine on the minus strand of viral DNA results in catastrophic G-to-A mutations in the viral genome."
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4"), chain_type="stuff", retriever=retriever
)
qa.run(query)

The text does not provide information on the base htype of the 'video_seq' tensor.

3.4 Deep Memory 成本節省

Deep Memory 在不改變您現有工作流程的情況下提高了檢索準確性。此外，透過減少 LLM 的 top_k 輸入，您可以透過降低 token 使用量來大幅降低推論成本。

檢索器概念指南
檢索器操作指南

1. 資料集建立​

2. 產生合成查詢並訓練 Deep Memory​

待辦事項：新增圖片​

3. 評估 Deep Memory 效能​

3.1 Deep Memory 評估​

3.2 Deep Memory + RAGas​

3.3 Deep Memory 推論​

待辦事項：新增圖片​

3.4 Deep Memory 成本節省​

相關連結​

此頁面是否對您有幫助？

1. 資料集建立

2. 產生合成查詢並訓練 Deep Memory

待辦事項：新增圖片

3. 評估 Deep Memory 效能

3.1 Deep Memory 評估

3.2 Deep Memory + RAGas

3.3 Deep Memory 推論

待辦事項：新增圖片

3.4 Deep Memory 成本節省

相關連結