Boosting Retrieval in RAG for LLMs: The Power of BM25 and RRF

3 min readAug 11, 2024

BM25 (Best Matching 25) and RRF (Reciprocal Rank Fusion) are two techniques that can be used to imBM25 (Best Matching 25) and RRF (Reciprocal Rank Fusion) are two techniques that can be used to improve the retrieval step in Retrieval-Augmented Generation (RAG) for large language models (LLMs)prove the retrieval step in Retrieval-Augmented Generation (RAG) for large language models (LLMs). Here’s an explanation of how each can be used and their role in the retrieval process for RAG:

BM25 (Best Matching 25)

BM25 is a probabilistic information retrieval model that ranks documents based on the relevance to a given query. It is an extension of the traditional TF-IDF (Term Frequency-Inverse Document Frequency) model and is widely used due to its effectiveness in various retrieval tasks.

How BM25 Works:

Term Frequency (TF): The number of times a term appears in a document. BM25 modifies this by considering document length, penalizing longer documents where terms might appear more frequently just by chance.
Inverse Document Frequency (IDF): A measure of how common or rare a term is across all documents. Rare terms are given more weight.
Normalization: Adjusts for the length of documents to ensure fair comparison between documents of different lengths.

Using BM25 in RAG:

Indexing: The collection of documents is indexed using BM25, which pre-computes the term frequencies and other statistics.
Query Processing: When a query is issued (e.g., from an LLM seeking additional context), BM25 scores each document based on the relevance to the query.
Ranking: Documents are ranked according to their BM25 scores, and the top-ranked documents are retrieved as relevant context for the LLM.

RRF (Reciprocal Rank Fusion)

RRF is an ensemble technique used to combine the results from multiple retrieval models. It is particularly useful when you have different retrieval models that might capture different aspects of relevance.

How RRF Works:

Rank Combination: Each retrieval model produces a ranked list of documents. RRF combines these rankings by assigning a score to each document based on its position in each ranked list.

Score Calculation: The score for a document d is calculated as

RRF Formula

Fusion: Documents are re-ranked based on their combined RRF scores, leading to a final list of documents that ideally captures the best of all models.

Using RRF in RAG:

Multiple Models: Use multiple retrieval models (e.g., BM25, neural retrieval models) to independently retrieve and rank documents for a given query.
Rank Fusion: Apply RRF to combine the ranked lists from these models, producing a final, more robust ranked list of relevant documents.
Contextual Retrieval: Provide the top-ranked documents from the RRF-combined list as context to the LLM for generating more accurate and relevant responses.

Combining BM25 and RRF in RAG

Initial Retrieval: Use BM25 to perform initial retrieval, leveraging its efficiency and effectiveness in ranking documents based on term relevance.
Ensemble Approach: Incorporate other retrieval models (e.g., neural retrieval models) alongside BM25.
Rank Fusion: Apply RRF to combine the ranked outputs from BM25 and the other retrieval models, producing a final list that benefits from multiple perspectives on relevance.
Augmented Generation: Feed the top-ranked documents from the RRF-fused list into the LLM, enhancing its ability to generate accurate and contextually relevant responses.

By combining BM25 and RRF in the retrieval step of RAG, you can leverage the strengths of different retrieval models to improve the overall quality and relevance of the documents retrieved, thereby enhancing the performance of the LLM in generating well-informed responses.

Boosting Retrieval in RAG for LLMs: The Power of BM25 and RRF

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Karthikeyan Dhanakotti

No responses yet