Tag
This paper identifies a vocabulary gap as the root cause why advanced encoders like ModernBERT underperform in learned sparse retrieval, and proposes Vocabulary Transfer (VT), a model-agnostic framework that migrates encoders to sparse-friendly vocabularies, achieving state-of-the-art on the BEIR benchmark.
This paper introduces DiffRetriever, a method that uses diffusion language models to generate multiple representative tokens in parallel for efficient information retrieval, outperforming autoregressive baselines in speed and accuracy.