@DailyDoseOfDS_: Stop using vector search everywhere! A 30-year-old algorithm with zero training, zero embeddings, and zero fine-tuning …

X AI KOLs Timeline 05/07/26, 09:30 AM News

bm25 search keyword-matching hybrid-search rag elasticsearch opensearch

Summary

The article argues against overusing vector search, highlighting BM25's effectiveness for exact keyword matching and its role in hybrid search systems.

Stop using vector search everywhere! A 30-year-old algorithm with zero training, zero embeddings, and zero fine-tuning still powers Elasticsearch, OpenSearch, and most production search systems today. It's called BM25. Let us explain what makes it so powerful: Imagine you're searching for "transformer attention mechanism" in a library of ML papers. BM25 asks three simple questions: "How rare is this word?" Every paper contains "the" and "is", which makes it useless. But "transformer" is specific and informative. BM25 boosts rare words and ignores the noise. → This is IDF(qᵢ) in the formula "How many times does it appear?" If "attention" appears 10 times in a paper, that's a good sign. But 10 vs 100 occurrences won't make much difference. BM25 applies diminishing returns. → This is f(qᵢ, D) combined with k₁ that controls saturation "Is this document unusually long?" A 50-page paper will naturally contain more keywords than a 5-page paper. BM25 levels the playing field so longer documents don't cheat their way to the top. → This is |D|/avgdl controlled by parameter b Three questions. No neural networks. No training data. Just elegant math (refer to the image below) The best part: BM25 excels at exact keyword matching - something embeddings often struggle with. If your user searches for "error code 5012," embeddings might return semantically similar results. BM25 will find the exact match. This is why hybrid search exists. Top RAG systems today combine BM25 with vector search. You get the best of both worlds: semantic understanding AND precise keyword matching. So before you throw GPUs at every search problem, consider BM25. It might already solve your problem, or make your semantic search even better when combined.

Original Article

@DailyDoseOfDS_: Stop using vector search everywhere! A 30-year-old algorithm with zero training, zero embeddings, and zero fine-tuning …

Similar Articles

@Al_Grigor: Don't start a RAG project with vector search by default. Start with a text search. It is simpler: - No embedding model …

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

is [ BM25 + vector ]+ RRF really worth it?

@mixedbreadai: By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain t…

Mongo with vector search performance

Submit Feedback

Similar Articles

@Al_Grigor: Don't start a RAG project with vector search by default. Start with a text search. It is simpler: - No embedding model …

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

is [ BM25 + vector ]+ RRF really worth it?

@mixedbreadai: By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain t…

Mongo with vector search performance