UpTrain

UpTrain [github || 網站 || 文件] 是一個開源平台，用於評估和改進 LLM 應用程式。它為 20 多個預先配置的檢查提供評分（涵蓋語言、程式碼、嵌入使用案例），對失敗案例的實例執行根本原因分析，並為解決這些問題提供指導。

UpTrain 回呼處理器

此筆記本展示了 UpTrain 回呼處理器無縫整合到您的管線中，促進多樣化的評估。我們選擇了一些我們認為適合評估鏈的評估。這些評估自動運行，結果顯示在輸出中。有關 UpTrain 評估的更多詳細資訊，請參閱此處。

Langchain 中選定的檢索器已突出顯示以進行示範

1. 原始 RAG：

RAG 在檢索上下文和產生回應方面發揮著至關重要的作用。為了確保其效能和回應品質，我們進行以下評估

上下文相關性：確定從查詢中提取的上下文是否與回應相關。
事實準確性：評估 LLM 是否產生幻覺或提供不正確的資訊。
回應完整性：檢查回應是否包含查詢要求的所有資訊。

2. 多重查詢產生：

MultiQueryRetriever 建立問題的多個變體，這些變體具有與原始問題相似的含義。鑑於其複雜性，我們納入了先前的評估並添加了

多重查詢準確性：確保產生的多重查詢與原始查詢的含義相同。

3. 上下文壓縮和重新排序：

重新排序涉及根據與查詢的相關性重新排序節點並選擇前 n 個節點。由於節點數量在重新排序完成後可能會減少，因此我們執行以下評估

上下文重新排序：檢查重新排序節點的順序是否比原始順序更與查詢相關。
上下文簡潔性：檢查減少的節點數量是否仍提供所有需要的資訊。

這些評估共同確保了 RAG、MultiQueryRetriever 和鏈中重新排序過程的穩健性和有效性。

安裝依賴項

%pip install -qU langchain langchain_openai langchain-community uptrain faiss-cpu flashrank

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
``````output
[33mWARNING: There was an error checking the latest version of pip.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.

注意：如果您想使用啟用 GPU 的程式庫版本，您也可以安裝 faiss-gpu 而不是 faiss-cpu。

匯入程式庫

from getpass import getpass

from langchain.chains import RetrievalQA
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.callbacks.uptrain_callback import UpTrainCallbackHandler
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import (
    RecursiveCharacterTextSplitter,
)

載入文件

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()

將文件分割成區塊

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)

建立檢索器

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()

定義 LLM

llm = ChatOpenAI(temperature=0, model="gpt-4")

設定

UpTrain 為您提供

具有進階向下鑽取和篩選選項的儀表板
失敗案例的洞察和常見主題
生產資料的可觀察性和即時監控
透過與您的 CI/CD 管線無縫整合進行回歸測試

您可以選擇以下選項來使用 UpTrain 進行評估

1. UpTrain 的開源軟體 (OSS)：

您可以使用開源評估服務來評估您的模型。在這種情況下，您將需要提供 OpenAI API 金鑰。UpTrain 使用 GPT 模型來評估 LLM 產生的回應。您可以從此處取得您的金鑰。

為了在 UpTrain 儀表板中查看您的評估，您需要透過在終端機中執行以下命令來設定它

git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh

這將在您的本機電腦上啟動 UpTrain 儀表板。您可以透過 https://127.0.0.1:3000/dashboard 存取它。

參數

key_type="openai"
api_key="OPENAI_API_KEY"
project_name="PROJECT_NAME"

2. UpTrain 受管服務和儀表板：

或者，您可以使用 UpTrain 的受管服務來評估您的模型。您可以在此處建立免費的 UpTrain 帳戶並獲得免費試用額度。如果您想要更多試用額度，請在此處與 UpTrain 的維護者預約通話。

使用受管服務的好處是

無需在本機電腦上設定 UpTrain 儀表板。
無需 API 金鑰即可存取許多 LLM。

執行評估後，您可以在 https://dashboard.uptrain.ai/dashboard 的 UpTrain 儀表板中查看它們

參數

key_type="uptrain"
api_key="UPTRAIN_API_KEY"
project_name="PROJECT_NAME"

注意： project_name 將是專案名稱，在此專案名稱下，執行的評估將顯示在 UpTrain 儀表板中。

設定 API 金鑰

筆記本將提示您輸入 API 金鑰。您可以透過變更下方儲存格中的 key_type 參數，在 OpenAI API 金鑰或 UpTrain API 金鑰之間進行選擇。

KEY_TYPE = "openai"  # or "uptrain"
API_KEY = getpass()

1. 原始 RAG

UpTrain 回呼處理器將在產生查詢、上下文和回應後自動捕獲它們，並在回應上執行以下三個評估（評分從 0 到 1）

上下文相關性：檢查從查詢中提取的上下文是否與回應相關。
事實準確性：檢查回應的事實準確性。
回應完整性：檢查回應是否包含查詢要求的所有資訊。

# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

# Create the chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt_text
    | llm
    | StrOutputParser()
)

# Create the uptrain callback handler
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(query, config=config)

[32m2024-04-17 17:03:44.969[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:05.809[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output

Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that she is a former top litigator in private practice, a former federal public defender, and comes from a family of public school educators and police officers. He described her as a consensus builder and noted that since her nomination, she has received a broad range of support from various groups, including the Fraternal Order of Police and former judges appointed by both Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

2. 多重查詢產生

MultiQueryRetriever 用於解決 RAG 管線可能無法根據查詢傳回最佳文件集的問題。它產生多個與原始查詢含義相同的查詢，然後擷取每個查詢的文件。

為了評估此檢索器，UpTrain 將執行以下評估

多重查詢準確性：檢查產生的多重查詢是否與原始查詢的含義相同。

# Create the retriever
multi_query_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

chain = (
    {"context": multi_query_retriever, "question": RunnablePassthrough()}
    | rag_prompt_text
    | llm
    | StrOutputParser()
)

# Invoke the chain with a query
question = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(question, config=config)

[32m2024-04-17 17:04:10.675[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:16.804[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output

Question: What did the president say about Ketanji Brown Jackson
Multi Queries:
  - How did the president comment on Ketanji Brown Jackson?
  - What were the president's remarks regarding Ketanji Brown Jackson?
  - What statements has the president made about Ketanji Brown Jackson?

Multi Query Accuracy Score: 0.5
``````output
[32m2024-04-17 17:04:22.027[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:44.033[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output

Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that since her nomination, she has received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

3. 上下文壓縮和重新排序

重新排序過程涉及根據與查詢的相關性重新排序節點並選擇前 n 個節點。由於節點數量在重新排序完成後可能會減少，因此我們執行以下評估

上下文重新排序：檢查重新排序節點的順序是否比原始順序更與查詢相關。
上下文簡潔性：檢查減少的節點數量是否仍提供所有需要的資訊。

# Create the retriever
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

# Create the chain
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)

# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
result = chain.invoke(query, config=config)

[32m2024-04-17 17:04:46.462[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:53.561[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output

Question: What did the president say about Ketanji Brown Jackson

Context Conciseness Score: 0.0
Context Reranking Score: 1.0
``````output
[32m2024-04-17 17:04:56.947[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:05:16.551[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m
``````output

Question: What did the president say about Ketanji Brown Jackson
Response: The President mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5

UpTrain 的儀表板和洞察

這是一個簡短的影片，展示了儀表板和洞察

UpTrain 回呼處理器​

1. 原始 RAG：​

2. 多重查詢產生：​

3. 上下文壓縮和重新排序：​

安裝依賴項​

匯入程式庫​

載入文件​

將文件分割成區塊​

建立檢索器​

定義 LLM​

設定​

1. UpTrain 的開源軟體 (OSS)：​

2. UpTrain 受管服務和儀表板：​

設定 API 金鑰​