DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Hugging Face Daily Papers Papers

Summary

This paper introduces DiffRetriever, a method that uses diffusion language models to generate multiple representative tokens in parallel for efficient information retrieval, outperforming autoregressive baselines in speed and accuracy.

PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and prior multi-token variants did not reliably improve over single-token decoding. We show that the bottleneck is sequential generation, not the multi-token idea itself. DiffRetriever is a representative-token retriever for diffusion language models: it appends K masked positions to the prompt and reads all K in a single bidirectional forward pass. Across in-domain and out-of-domain evaluation, multi-token DiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressive multi-token is flat or negative and pays a latency cost that scales with K where diffusion does not. After supervised fine-tuning, DiffRetriever on Dream is the strongest BEIR-7 retriever in our comparison, ahead of PromptReps, the encoder-style DiffEmbed baseline on the same diffusion backbones, and the contrastively fine-tuned single-vector RepLLaMA. A per-query oracle on the frozen base model exceeds contrastive fine-tuning at the same fixed budget, pointing to adaptive budget selection as future work. Code is available at https://github.com/ielab/diffretriever.
Original Article
View Cached Full Text

Cached at: 05/12/26, 02:50 AM

Paper page - DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Source: https://huggingface.co/papers/2605.07210

Abstract

DiffRetriever enables efficient multi-token retrieval using diffusion language models by generating representations in parallel rather than sequentially, achieving superior performance over autoregressive methods.

PromptReps showed that anautoregressive language modelcan be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and priormulti-tokenvariants did not reliably improve oversingle-token decoding. We show that the bottleneck is sequential generation, not themulti-tokenidea itself. DiffRetriever is arepresentative-token retrieverfordiffusion language models: it appends Kmasked positionsto the prompt and reads all K in a singlebidirectional forward pass. Across in-domain and out-of-domain evaluation,multi-tokenDiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressivemulti-tokenis flat or negative and pays a latency cost that scales with K where diffusion does not. Aftersupervised fine-tuning, DiffRetriever onDreamis the strongestBEIR-7retriever in our comparison, ahead of PromptReps, the encoder-style DiffEmbed baseline on the same diffusion backbones, and the contrastively fine-tuned single-vectorRepLLaMA. A per-query oracle on the frozen base model exceedscontrastive fine-tuningat the same fixed budget, pointing to adaptive budget selection as future work. Code is available at https://github.com/ielab/diffretriever.

View arXiv pageView PDFGitHub2Add to collection

Get this paper in your agent:

hf papers read 2605\.07210

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.07210 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.07210 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.07210 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Residual Context Diffusion Language Models (2 minute read)

TLDR AI

This paper introduces Residual Context Diffusion (RCD), a module that recycles discarded token representations in diffusion language models to improve efficiency and accuracy, achieving 5–10% better accuracy and up to 4–5x fewer denoising steps on challenging reasoning tasks.