DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Hugging Face Daily Papers 05/08/26, 12:00 AM Papers

Summary

This paper introduces DiffRetriever, a method that uses diffusion language models to generate multiple representative tokens in parallel for efficient information retrieval, outperforming autoregressive baselines in speed and accuracy.

PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and prior multi-token variants did not reliably improve over single-token decoding. We show that the bottleneck is sequential generation, not the multi-token idea itself. DiffRetriever is a representative-token retriever for diffusion language models: it appends K masked positions to the prompt and reads all K in a single bidirectional forward pass. Across in-domain and out-of-domain evaluation, multi-token DiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressive multi-token is flat or negative and pays a latency cost that scales with K where diffusion does not. After supervised fine-tuning, DiffRetriever on Dream is the strongest BEIR-7 retriever in our comparison, ahead of PromptReps, the encoder-style DiffEmbed baseline on the same diffusion backbones, and the contrastively fine-tuned single-vector RepLLaMA. A per-query oracle on the frozen base model exceeds contrastive fine-tuning at the same fixed budget, pointing to adaptive budget selection as future work. Code is available at https://github.com/ielab/diffretriever.

Original Article

View Cached Full Text

Cached at: 05/12/26, 02:50 AM

Paper page - DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Source: https://huggingface.co/papers/2605.07210

Abstract

DiffRetriever enables efficient multi-token retrieval using diffusion language models by generating representations in parallel rather than sequentially, achieving superior performance over autoregressive methods.

PromptReps showed that anautoregressive language modelcan be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and priormulti-tokenvariants did not reliably improve oversingle-token decoding. We show that the bottleneck is sequential generation, not themulti-tokenidea itself. DiffRetriever is arepresentative-token retrieverfordiffusion language models: it appends Kmasked positionsto the prompt and reads all K in a singlebidirectional forward pass. Across in-domain and out-of-domain evaluation,multi-tokenDiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressivemulti-tokenis flat or negative and pays a latency cost that scales with K where diffusion does not. Aftersupervised fine-tuning, DiffRetriever onDreamis the strongestBEIR-7retriever in our comparison, ahead of PromptReps, the encoder-style DiffEmbed baseline on the same diffusion backbones, and the contrastively fine-tuned single-vectorRepLLaMA. A per-query oracle on the frozen base model exceedscontrastive fine-tuningat the same fixed budget, pointing to adaptive budget selection as future work. Code is available at https://github.com/ielab/diffretriever.

View arXiv page View PDF GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2605\.07210

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.07210 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.07210 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.07210 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Paper page - DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

Residual Context Diffusion Language Models (2 minute read)

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Submit Feedback

Similar Articles

Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

Residual Context Diffusion Language Models (2 minute read)

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Discrete Diffusion Language Models for Interactive Radiology Report Drafting