BM25 (Best Match 25) is a probabilistic ranking function used in search engines to score documents by relevance to a query. It improves on simple keyword counting by applying TF saturation (diminishing returns for repeated words), inverse document frequency (weighting rare words higher), and document length normalisation. It is the default ranking algorithm in Elasticsearch, Apache Lucene, and Solr.

How is BM25 different from normal keyword search?

Normal keyword search ranks documents by raw word count — more matches equals higher rank, which can be gamed by keyword stuffing and unfairly favours long documents. BM25 fixes all three problems: it caps the benefit of repeated words (TF saturation), penalises documents that are longer than average (length normalisation), and gives higher weight to rare words (IDF).

How do I use BM25 in Python?

Install the rank-bm25 library with 'pip install rank-bm25'. Then tokenize your documents, build a BM25Okapi index, and call get_scores() or get_top_n() with your query tokens. For production use in a RAG pipeline, build the index once on your knowledge base and call get_top_n() at query time to retrieve the most relevant documents before passing them to your LLM.

Is BM25 still used in 2025/2026?

Yes. BM25 is the default ranking algorithm in Elasticsearch, OpenSearch, Apache Lucene, and Solr. It is also commonly used as the lexical component in hybrid search systems that combine BM25 with dense vector embeddings, giving better recall than either method alone.

Blog · Search Systems

BM25 vs Normal Search

Why counting words is a bad idea — and how BM25 judges quality, not quantity. A beginner-friendly guide with Python code and real-world analogies.

See the Formula Python Code

🔎 Part 1 — Normal Search (Basic Keyword Count)

Early search engines (and many simple implementations today) rank documents by one rule: "Count how many times the query words appear in the document." More matches = higher rank. Simple as that.

Example: Search for "python developer"

Given these three documents:

Doc A

"python python python developer"

Normal: 4 matches 🥇

Doc B

"python developer roadmap guide"

Normal: 2 matches

Doc C

"developer"

Normal: 1 match

Normal search ranking: Doc A → Doc B → Doc C. Doc A wins just because it repeats "python" three times — even though it is clearly spammy content.

Problems with normal search

Problem	What happens	Impact
Keyword stuffing	Repeat a word 100 times → rank #1	Spammy content wins ❌
Ignores document length	A 10,000-word doc will almost always outscore a 50-word doc	Long docs dominate ❌
Ignores word rarity	"the" and "python" treated equally	Common words pollute score ❌
No diminishing returns	10th mention of "python" adds as much as the 1st	No quality judgment ❌

Normal search = quantity-based. More words = better score.

⚖️ Part 2 — BM25 (Smart Ranking)

BM25 (Best Match 25) is a ranking function used in Elasticsearch Apache Lucene Solr and many production search systems. It was first formalised in the 1990s and described in detail in Robertson & Zaragoza (2009) — The Probabilistic Relevance Framework: BM25 and Beyond. Despite being decades old, it remains one of the most effective lexical search algorithms available.

BM25 says: "Let me judge quality, not just quantity." It fixes every problem listed above by adding three ingredients.

The three key ideas

①
Term Frequency Saturation (TF)The first mention of "python" adds a lot. The 5th adds a little. The 20th adds almost nothing. Repetition is controlled — not rewarded infinitely.
②
Document Length NormalisationA short, focused document that mentions "python" twice scores better than a 5000-word document that also mentions it twice. Long documents are penalised proportionally.
③
Inverse Document Frequency (IDF)Rare words (like "BM25") get a high weight. Common words (like "the", "is") get a low weight. Matching a rare word matters more than matching a common one.

Same example — BM25 ranking

Search: "python developer"

🥇

Doc B — BM25 Winner

"python developer roadmap guide"

BM25: high ✅

Balanced usage, clear context, short & relevant.

Doc A — Penalised

"python python python developer"

BM25: medium ⚠️

High frequency but repetitive → saturation kicks in.

Doc C — Low Score

"developer"

BM25: low ❌

Missing "python" entirely.

📐 The BM25 Formula (Plain English)

BM25 Score for a single query term in a document

score(D, q) = IDF(q) × [ (TF(q,D) × (k1 + 1)) / (TF(q,D) + k1 × (1 - b + b × |D|/avgdl)) ] Where: TF(q, D) — how many times query word q appears in document D IDF(q) — how rare the word q is across all documents |D| — length of document D (word count) avgdl — average document length in the collection k1 — controls term frequency saturation (typical: 1.2 – 2.0) b — controls document length penalty (typical: 0.75)

Don't panic. You don't need to memorise this formula. What matters: the score grows with TF but flattens (saturates), shrinks for long documents, and is boosted for rare words (IDF).

What the parameters do

Parameter	Controls	Higher value means
k1	TF saturation speed	Repetition matters more before flattening
b = 1.0	Full length normalisation	Long docs heavily penalised
b = 0.0	No length normalisation	Same as raw TF (length ignored)
b = 0.75	Default balance	Moderate length penalty (recommended)

🔍 Head-to-Head Comparison

Feature	Normal Search	BM25
Word counting	Raw count only	Saturated (diminishing returns) ✅
Keyword repetition	Over-rewarded ❌	Controlled ✅
Document length	Ignored ❌	Normalised ✅
Rare vs common words	Treated equally ❌	IDF weighting ✅
Used in production	Only in very basic systems	Elasticsearch, Lucene, Solr ✅
Works without ML models	Yes ✅	Yes ✅
Understands meaning	No (lexical only) ❌	No (lexical only) ❌

Note for beginnersNeither normal search nor BM25 understands meaning or synonyms. That's what semantic search (embeddings) solves. BM25 is still great for exact term matching and is often used alongside embeddings in hybrid search.

🎯 Real-Life Analogy

❌ Normal Search

Bad teacher grading an exam:
"You wrote the word 'photosynthesis' 15 times — you get full marks!"

Doesn't matter if the answer makes sense. Just counting keywords.

✅ BM25

Smart teacher grading an exam:
"You explained the concept clearly and concisely, without padding. Rare technical terms used correctly. No unnecessary repetition."

Quality and relevance over raw quantity.

🐍 Python Code — Build It from Scratch

Let's implement a simple version step by step. You do not need any special library for the scratch version — just Python's standard library.

Step 1 — Normal (keyword count) search

python · normal_search.py

# ── Normal Search: just count matching words ──────────────────

def tokenize(text):
    """Lowercase and split into words."""
    return text.lower().split()


def normal_score(document, query_tokens):
    """Count how many query words appear in the document."""
    doc_tokens = tokenize(document)
    score = 0
    for word in query_tokens:
        score += doc_tokens.count(word)   # raw count — no limit
    return score


# ── Example ───────────────────────────────────────────────────
documents = [
    "python python python developer",          # Doc A
    "python developer roadmap guide",           # Doc B
    "developer",                                # Doc C
]

query = "python developer"
query_tokens = tokenize(query)

results = [
    (doc, normal_score(doc, query_tokens))
    for doc in documents
]
results.sort(key=lambda x: x[1], reverse=True)

for rank, (doc, score) in enumerate(results, 1):
    print(f"{rank}. Score={score}  →  {doc}")

## Output:
## 1. Score=4  →  python python python developer   ← spammy doc wins 😤
## 2. Score=2  →  python developer roadmap guide
## 3. Score=1  →  developer

Step 2 — BM25 from scratch

python · bm25_scratch.py

import math
from collections import Counter

# ── BM25 from scratch ─────────────────────────────────────────

def tokenize(text):
    return text.lower().split()


class BM25:
    def __init__(self, corpus, k1=1.5, b=0.75):
        self.k1 = k1
        self.b  = b
        self.tokenized_corpus = [tokenize(doc) for doc in corpus]
        self.doc_count  = len(corpus)
        self.avg_dl     = sum(len(d) for d in self.tokenized_corpus) / self.doc_count
        self.tf         = [Counter(d) for d in self.tokenized_corpus]

    def _idf(self, word):
        docs_with_word = sum(1 for tf in self.tf if word in tf)
        if docs_with_word == 0:
            return 0.0
        return math.log(
            (self.doc_count - docs_with_word + 0.5) /
            (docs_with_word + 0.5) + 1
        )

    def score(self, doc_index, query_tokens):
        tf   = self.tf[doc_index]
        dl   = len(self.tokenized_corpus[doc_index])
        total = 0.0
        for word in query_tokens:
            freq = tf.get(word, 0)
            idf  = self._idf(word)
            numerator   = freq * (self.k1 + 1)
            denominator = freq + self.k1 * (1 - self.b + self.b * dl / self.avg_dl)
            total += idf * (numerator / denominator)
        return round(total, 4)

    def rank(self, query):
        tokens  = tokenize(query)
        scores  = [(i, self.score(i, tokens)) for i in range(self.doc_count)]
        return sorted(scores, key=lambda x: x[1], reverse=True)


# ── Example ───────────────────────────────────────────────────
documents = [
    "python python python developer",
    "python developer roadmap guide",
    "developer",
]

bm25    = BM25(documents)
results = bm25.rank("python developer")

for rank, (idx, score) in enumerate(results, 1):
    print(f"{rank}. BM25={score}  →  {documents[idx]}")

## Output:
## 1. BM25=1.0986  →  python developer roadmap guide  ← quality wins 🏆
## 2. BM25=0.8109  →  python python python developer
## 3. BM25=0.4055  →  developer

Step 3 — Use the `rank_bm25` library (recommended)

Install once: pip install rank-bm25 · PyPI · GitHub

python · rank_bm25_library.py

from rank_bm25 import BM25Okapi

# Tokenise each document into a list of words
documents = [
    "python python python developer",
    "python developer roadmap guide",
    "developer",
]
tokenized_docs = [doc.split() for doc in documents]

# Build the BM25 index
bm25 = BM25Okapi(tokenized_docs)

# Search
query   = "python developer"
scores  = bm25.get_scores(query.split())

for idx, score in enumerate(scores):
    print(f"Doc {idx}: score={score:.4f}  →  {documents[idx]}")

## Doc 0: score=0.8109  →  python python python developer
## Doc 1: score=1.0986  →  python developer roadmap guide  ← still wins ✅
## Doc 2: score=0.4055  →  developer

# Get top-N results
top_docs = bm25.get_top_n(query.split(), documents, n=2)
print("
Top 2 results:")
for doc in top_docs:
    print(" -", doc)

Step 4 — Use BM25 in a RAG pipeline

python · bm25_rag_example.py

from rank_bm25 import BM25Okapi

# Your knowledge base (replace with real documents)
knowledge_base = [
    "Python is a popular programming language for data science.",
    "BM25 is a ranking function used in information retrieval.",
    "RAG stands for Retrieval-Augmented Generation.",
    "Elasticsearch uses BM25 as its default ranking algorithm.",
    "Vector embeddings capture semantic meaning of text.",
]

def build_bm25_index(docs):
    tokenized = [doc.lower().split() for doc in docs]
    return BM25Okapi(tokenized)

def retrieve(query, docs, bm25_index, top_k=3):
    """Retrieve the top-k most relevant documents for the query."""
    tokens = query.lower().split()
    return bm25_index.get_top_n(tokens, docs, n=top_k)

# Build index once
index = build_bm25_index(knowledge_base)

# Retrieve at query time
user_query   = "How does BM25 work in search?"
top_contexts = retrieve(user_query, knowledge_base, index)

print("Retrieved context for LLM:")
for i, ctx in enumerate(top_contexts, 1):
    print(f"  {i}. {ctx}")

# Then pass top_contexts to your LLM (OpenAI, Ollama, etc.) as context

🎮 Interactive Demo

Type a query and see Normal Search vs BM25 ranking difference live in your browser.

Your search query:

Documents in corpus:

🚀 Why This Matters for AI Developers

You might think: "We have powerful embedding models now, why care about BM25?" Here's why it is still essential:

BM25 is inside Elasticsearch and OpenSearch by default. Most production search stacks still use it as the primary retrieval layer.
Hybrid search = BM25 + Embeddings. The best RAG systems combine both — BM25 handles exact keyword matches, embeddings handle semantic/conceptual matches. Together they give better recall than either alone.
BM25 needs no GPU and no model inference. It is instant, cheap, and scales to millions of documents easily.
BM25 works well for rare technical terms. Embeddings sometimes struggle with domain-specific jargon ("BM25Okapi", "HNSW"). BM25 finds exact matches reliably.

Recommended stack for beginners building RAGStart with BM25-only retrieval (rank-bm25). Once working, add embeddings for semantic recall and combine the scores. That hybrid pipeline is what most production systems use.

Learning path

→
You are hereBM25 vs Normal Search (this article)
→
Next: Semantic SearchEmbeddings, cosine similarity, vector retrieval →
→
Coming soon: Hybrid SearchCombining BM25 + vectors with score fusion and reranking.

Next: Semantic Search →← All Articles

Comments

Ask questions, share feedback, or reply to other readers. Sign in with Google to join the discussion.

Loading comments...

BM25 vs Normal Search

🔎 Part 1 — Normal Search (Basic Keyword Count)

Example: Search for "python developer"

Problems with normal search

⚖️ Part 2 — BM25 (Smart Ranking)

The three key ideas

Same example — BM25 ranking

📐 The BM25 Formula (Plain English)

What the parameters do

🔍 Head-to-Head Comparison

🎯 Real-Life Analogy

🐍 Python Code — Build It from Scratch

Step 1 — Normal (keyword count) search

Step 2 — BM25 from scratch

Step 3 — Use the rank_bm25 library (recommended)

Step 4 — Use BM25 in a RAG pipeline

🎮 Interactive Demo

🚀 Why This Matters for AI Developers

Learning path

Comments

Step 3 — Use the `rank_bm25` library (recommended)