如何重新排序檢索結果以減輕「迷失於中間」效應

在 RAG 應用程式中，當檢索的文件數量增加時（例如，超過十個），效能顯著下降已被記錄。簡而言之：模型容易遺漏長上下文中中間的相關資訊。

相比之下，針對向量資料庫的查詢通常會以相關性降序返回文件（例如，以嵌入的餘弦相似度衡量）。

為了減輕「迷失於中間」效應，您可以在檢索後重新排序文件，使最相關的文件位於極端位置（例如，上下文的第一個和最後一個部分），而最不相關的文件位於中間位置。在某些情況下，這可以幫助將最相關的資訊呈現給 LLM。

LongContextReorder 文件轉換器實作了此重新排序程序。下面我們示範一個範例。

%pip install -qU langchain langchain-community langchain-openai

首先，我們嵌入一些人工文件，並將它們索引到基本的記憶體內向量資料庫中。我們將使用 OpenAI 嵌入，但任何 LangChain 向量資料庫或嵌入模型都適用。

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

# Get embeddings.
embeddings = OpenAIEmbeddings()

texts = [
    "Basquetball is a great sport.",
    "Fly me to the moon is one of my favourite songs.",
    "The Celtics are my favourite team.",
    "This is a document about the Boston Celtics",
    "I simply love going to the movies",
    "The Boston Celtics won the game by 20 points",
    "This is just a random text.",
    "Elden Ring is one of the best games in the last 15 years.",
    "L. Kornet is one of the best Celtics players.",
    "Larry Bird was an iconic NBA player.",
]

# Create a retriever
retriever = InMemoryVectorStore.from_texts(texts, embedding=embeddings).as_retriever(
    search_kwargs={"k": 10}
)
query = "What can you tell me about the Celtics?"

# Get relevant documents ordered by relevance score
docs = retriever.invoke(query)
for doc in docs:
    print(f"- {doc.page_content}")

API 參考：InMemoryVectorStore | OpenAIEmbeddings

- The Celtics are my favourite team.
- This is a document about the Boston Celtics
- The Boston Celtics won the game by 20 points
- L. Kornet is one of the best Celtics players.
- Basquetball is a great sport.
- Larry Bird was an iconic NBA player.
- This is just a random text.
- I simply love going to the movies
- Fly me to the moon is one of my favourite songs.
- Elden Ring is one of the best games in the last 15 years.

請注意，文件以與查詢的相關性降序返回。LongContextReorder 文件轉換器將實作上述重新排序。

from langchain_community.document_transformers import LongContextReorder

# Reorder the documents:
# Less relevant document will be at the middle of the list and more
# relevant elements at beginning / end.
reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)

# Confirm that the 4 relevant documents are at beginning and end.
for doc in reordered_docs:
    print(f"- {doc.page_content}")

API 參考：LongContextReorder

- This is a document about the Boston Celtics
- L. Kornet is one of the best Celtics players.
- Larry Bird was an iconic NBA player.
- I simply love going to the movies
- Elden Ring is one of the best games in the last 15 years.
- Fly me to the moon is one of my favourite songs.
- This is just a random text.
- Basquetball is a great sport.
- The Boston Celtics won the game by 20 points
- The Celtics are my favourite team.

下面，我們展示如何將重新排序的文件納入簡單的問答鏈

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

prompt_template = """
Given these texts:
-----
{context}
-----
Please answer the following question:
{query}
"""

prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "query"],
)

# Create and invoke the chain:
chain = create_stuff_documents_chain(llm, prompt)
response = chain.invoke({"context": reordered_docs, "query": query})
print(response)

API 參考：create_stuff_documents_chain | PromptTemplate | ChatOpenAI

The Boston Celtics are a professional basketball team known for their rich history and success in the NBA. L. Kornet is recognized as one of the best players on the team, and the Celtics recently won a game by 20 points. The Celtics are favored by some fans, as indicated by the statement, "The Celtics are my favourite team." Overall, they have a strong following and are considered a significant part of basketball culture.

此頁面是否對您有幫助？