beir-benchmark

#beir-benchmark

Why Advanced Encoders Lag on Sparse Retrieval? The Answer and an Approach to Bridging Vocabulary Gaps

arXiv cs.AI ↗ · 2d ago Cached

This paper identifies a vocabulary gap as the root cause why advanced encoders like ModernBERT underperform in learned sparse retrieval, and proposes Vocabulary Transfer (VT), a model-agnostic framework that migrates encoders to sparse-friendly vocabularies, achieving state-of-the-art on the BEIR benchmark.

0 favorites 0 likes

#beir-benchmark

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

This paper introduces DiffRetriever, a method that uses diffusion language models to generate multiple representative tokens in parallel for efficient information retrieval, outperforming autoregressive baselines in speed and accuracy.

0 favorites 0 likes

beir-benchmark

Why Advanced Encoders Lag on Sparse Retrieval? The Answer and an Approach to Bridging Vocabulary Gaps

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Submit Feedback