Tag
This paper identifies a vocabulary gap as the root cause why advanced encoders like ModernBERT underperform in learned sparse retrieval, and proposes Vocabulary Transfer (VT), a model-agnostic framework that migrates encoders to sparse-friendly vocabularies, achieving state-of-the-art on the BEIR benchmark.
This paper presents a single-stage sparse coding method using unsupervised sparse autoencoders and natural inverted indexing to accelerate multi-vector retrieval, outperforming traditional k-means based approaches.
The paper proposes Latent Terms, a method using Sparse Autoencoders to extract BM25-ready sparse features from frozen dense retrievers, achieving competitive performance without retrieval-specific training.