DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
Summary
This paper introduces DiffRetriever, a method that uses diffusion language models to generate multiple representative tokens in parallel for efficient information retrieval, outperforming autoregressive baselines in speed and accuracy.
View Cached Full Text
Cached at: 05/12/26, 02:50 AM
Paper page - DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
Source: https://huggingface.co/papers/2605.07210
Abstract
DiffRetriever enables efficient multi-token retrieval using diffusion language models by generating representations in parallel rather than sequentially, achieving superior performance over autoregressive methods.
PromptReps showed that anautoregressive language modelcan be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and priormulti-tokenvariants did not reliably improve oversingle-token decoding. We show that the bottleneck is sequential generation, not themulti-tokenidea itself. DiffRetriever is arepresentative-token retrieverfordiffusion language models: it appends Kmasked positionsto the prompt and reads all K in a singlebidirectional forward pass. Across in-domain and out-of-domain evaluation,multi-tokenDiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressivemulti-tokenis flat or negative and pays a latency cost that scales with K where diffusion does not. Aftersupervised fine-tuning, DiffRetriever onDreamis the strongestBEIR-7retriever in our comparison, ahead of PromptReps, the encoder-style DiffEmbed baseline on the same diffusion backbones, and the contrastively fine-tuned single-vectorRepLLaMA. A per-query oracle on the frozen base model exceedscontrastive fine-tuningat the same fixed budget, pointing to adaptive budget selection as future work. Code is available at https://github.com/ielab/diffretriever.
View arXiv pageView PDFGitHub2Add to collection
Get this paper in your agent:
hf papers read 2605\.07210
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.07210 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.07210 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.07210 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding
Set Diffusion introduces a new class of language models that interpolates between autoregressive and diffusion models by factorizing token generation over flexible-position, flexible-length token sets. This enables faster decoding and flexible token ordering, achieving better speed-quality tradeoffs on reasoning, summarization, and unconditional generation tasks.
DARE: Diffusion Language Model Activation Reuse for Efficient Inference
This paper introduces DARE, a method for improving the inference efficiency of Diffusion Large Language Models by reusing cached key-value and output activations to reduce computational redundancy with negligible quality loss.
Residual Context Diffusion Language Models (2 minute read)
This paper introduces Residual Context Diffusion (RCD), a module that recycles discarded token representations in diffusion language models to improve efficiency and accuracy, achieving 5–10% better accuracy and up to 4–5x fewer denoising steps on challenging reasoning tasks.
Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment
This paper introduces Repr-Align, a method to adapt autoregressive language models into diffusion language models via representation alignment, achieving up to 4x training acceleration without retraining representations from scratch.
Discrete Diffusion Language Models for Interactive Radiology Report Drafting
This paper adapts a diffusion language model for interactive radiology report drafting, showing it matches autoregressive models in accuracy while offering unique infill capabilities that allow radiologists to fix report fragments and have the model fill in the text between them.