Discrete Diffusion Language Models for Interactive Radiology Report Drafting
Summary
This paper adapts a mixture-of-experts diffusion language model, DiffusionGemma-26B, for interactive radiology report drafting, showing it matches or exceeds autoregressive models in medical VQA with 3.5-4.4x faster decoding and bidirectional infill capabilities.
View Cached Full Text
Cached at: 07/03/26, 03:52 AM
Paper page - Discrete Diffusion Language Models for Interactive Radiology Report Drafting
Source: https://huggingface.co/papers/2607.01436
Abstract
Diffusion language models match or exceed autoregressive models in medical visual question answering while offering faster decoding and bidirectional text editing capabilities.
Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation.Medical foundation models, however, remain almost entirely autoregressive. We adapt amixture-of-expertsdiffusion language model,DiffusionGemma-26B, and benchmark it against its same-size AR siblingGemma-4-26Bunder an identicalLoRArecipe onmedical visual question answeringdatasets, scored by a verbosity-robustLLM judge. Diffusion matches or exceeds AR on all of them, and the finetuned model (3.8B active) is competitive with frontier vision-language models; its decoding is also 3.5-4.4x faster. Beyond this parity, the diffusion model offers adrafting capabilityAR lacks: any-orderinfill. Because the canvas is denoised bidirectionally, aradiologistcan fix report fragments and have the model fill the text between them, an operation inherent to diffusion but not to autoregression, which is subpar at it. This suits real reports, which are often terse or inconsistent across clinicians and institutions.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2607\.01436
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2607.01436 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2607.01436 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2607.01436 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Discrete Diffusion Language Models for Interactive Radiology Report Drafting
This paper adapts a diffusion language model for interactive radiology report drafting, showing it matches autoregressive models in accuracy while offering unique infill capabilities that allow radiologists to fix report fragments and have the model fill in the text between them.
google/diffusiongemma-26B-A4B-it
Google DeepMind releases DiffusionGemma, a 26B-parameter Mixture-of-Experts model that uses discrete diffusion for faster text generation, supporting multimodal inputs and a 256K token context.
AnchorDiff: Topology-Aware Masked Diffusion with Confidence-based Rewriting for Radiology Report Generation
AnchorDiff proposes a topology-aware masked diffusion framework for radiology report generation, integrating RadGraph-derived clinical anchors and confidence-based rewriting to achieve state-of-the-art results on MIMIC-CXR and MIMIC-RG4 benchmarks.
Diffusion Language Models: An Experimental Analysis
A systematic experimental analysis evaluating eight state-of-the-art Diffusion Language Models across multiple benchmarks, analyzing trade-offs between generation quality and computational efficiency.
@vllm_project: Congrats to @GoogleDeepMind on DiffusionGemma A 26B diffusion language model on the Gemma4 backbone, and the first dLLM…
vLLM announces native support for Google DeepMind's DiffusionGemma, a 26B discrete diffusion language model that generates 256-token blocks in parallel, enabling low-latency inference at 1200+ tok/s on a single H200.