@bclavie: Very excited to finally share this one after sitting on it for far too long! It's very topical now. Blog post coming ve…
Summary
Researchers extract indexable, BM25-ready sparse features from frozen dense retrievers using reconstruction-trained sparse autoencoders.
View Cached Full Text
Cached at: 05/30/26, 08:46 PM
Very excited to finally share this one after sitting on it for far too long! It’s very topical now. Blog post coming very soon :)
Sumit (@_reachsumit): Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies
@bclavie et al. extract indexable, BM25-ready sparse features from frozen dense retrievers using reconstruction-trained Sparse Autoencoders.
📝
Similar Articles
@_reachsumit: Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies @bclavie et al. extract in…
The paper proposes Latent Terms, a method using Sparse Autoencoders to extract BM25-ready sparse features from frozen dense retrievers, achieving competitive performance without retrieval-specific training.
@mixedbreadai: By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain t…
Single-vector embedding models can be used to extract sparse latent terms, and BM25 can turn this vocabulary into a strong retriever.
@lateinteraction: Late-interaction sparse retrieval? With neuron-level inverted indexing, on top of unsupervised sparse autoencoders. Wor…
This paper presents a single-stage sparse coding method using unsupervised sparse autoencoders and natural inverted indexing to accelerate multi-vector retrieval, outperforming traditional k-means based approaches.
@yifeiwang77: Thanks for sharing our work @lateinteraction @sum! The idea is extremely simple: - multi-vector retrieval is so costly …
The author shares their work on reducing the cost of multi-vector retrieval by using k-means as top-1 sparse coding. Omar Khattab adds that late-interaction sparse retrieval with neuron-level inverted indexing on unsupervised sparse autoencoders works well.
@_reachsumit: No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval @Veritas2026 et al. replace vector clus…
This paper proposes Single-stage Sparse Retrieval (SSR), which replaces K-means clustering with sparse autoencoders and inverted indexing, achieving 15x faster indexing and halved retrieval latency while improving accuracy on the BEIR benchmark.