bm25

Tag

Cards List
#bm25

Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval

arXiv cs.LG · 3d ago Cached

This paper proposes a training-free, CPU-only retrieval method that fuses BM25 lexical scores with late-interaction dense scores for conversational memory retrieval, achieving up to +17.2 points improvement on LoCoMo Hit@1 over late interaction alone across six encoders. The study provides controlled ablations on pooling operators, reranker effects, and benchmark robustness, framing the gain as a division of labor between dense and lexical signals.

0 favorites 0 likes
#bm25

is [ BM25 + vector ]+ RRF really worth it?

Reddit r/AI_Agents · 4d ago

This post questions whether combining BM25 and vector search with RRF improves hit rates in agentic memory retrieval, suggesting BM25 alone may suffice.

0 favorites 0 likes
#bm25

@mixedbreadai: By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain t…

X AI KOLs Following · 5d ago Cached

Single-vector embedding models can be used to extract sparse latent terms, and BM25 can turn this vocabulary into a strong retriever.

0 favorites 0 likes
#bm25

spent way too long debugging RAG before realizing the chunking was the problem the whole time

Reddit r/ArtificialInteligence · 6d ago

A developer recounts debugging RAG systems, discovering that fixed-size chunking breaks sentence boundaries, vector search fails for exact identifiers (solved with BM25), and stale indexes cause confident wrong answers.

0 favorites 0 likes
#bm25

@bclavie: Very excited to finally share this one after sitting on it for far too long! It's very topical now. Blog post coming ve…

X AI KOLs Timeline · 2026-05-30 Cached

Researchers extract indexable, BM25-ready sparse features from frozen dense retrievers using reconstruction-trained sparse autoencoders.

0 favorites 0 likes
#bm25

@_reachsumit: Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies @bclavie et al. extract in…

X AI KOLs Following · 2026-05-29 Cached

The paper proposes Latent Terms, a method using Sparse Autoencoders to extract BM25-ready sparse features from frozen dense retrievers, achieving competitive performance without retrieval-specific training.

0 favorites 0 likes
#bm25

@jerryjliu0: Real question: what is the actual latest state-of-the-art for file search and retrieval? - Actual grep over filesystem …

X AI KOLs Following · 2026-05-18 Cached

Jerry Liu asks about the current state-of-the-art for file search and retrieval, listing options from grep to hybrid search over a database.

0 favorites 0 likes
#bm25

@rwayne: Context Mode solves the other half of AI Agent context issues: sandboxed tool outputs + persistent sessions. A 56 KB Playwright snapshot compressed to 299 bytes, 98% of data never entering the context. Every file edit, Git operation, task decision is stored into…

X AI KOLs Timeline · 2026-05-12 Cached

Context Mode is a tool that solves AI agent context problems by sandboxing tool outputs and persisting sessions, achieving up to 98% compression of Playwright snapshots and using BM25 retrieval to reduce context window usage. It supports 15 platforms including Claude Code, Gemini CLI, VS Code Copilot, and is used by major tech companies.

0 favorites 0 likes
#bm25

Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

Hugging Face Daily Papers · 2026-05-11 Cached

This paper introduces Pi-Serini, a BM25-based agentic search system that demonstrates lexical retrieval can suffice for deep search when agents refine queries, achieving high accuracy and reducing costs compared to default settings.

0 favorites 0 likes
#bm25

@DailyDoseOfDS_: Stop using vector search everywhere! A 30-year-old algorithm with zero training, zero embeddings, and zero fine-tuning …

X AI KOLs Timeline · 2026-05-07

The article argues against overusing vector search, highlighting BM25's effectiveness for exact keyword matching and its role in hybrid search systems.

0 favorites 0 likes
← Back to home

Submit Feedback