@PavloMolchanov: We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive…
Summary
NVIDIA releases Nemotron-Labs-Diffusion, the first tri-mode language model family (3B/8B/14B) that switches between autoregressive, diffusion, and self-speculation decoding by changing the attention pattern, achieving up to 4× higher real throughput.
View Cached Full Text
Cached at: 05/20/26, 02:25 AM
We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive, Diffusion, and Self-Speculation decoding by simply changing the attention pattern/mask.
One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels.
Up to 4× higher real throughput for a single user.
HF Collection: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion…, open license Project page: https://research.nvidia.com/publication/2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive… Tech report: http://bit.ly/Nemotron-Labs-Diffusion-Report…
Details below
Nemotron-Labs-Diffusion - a nvidia Collection
Source: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion updatedabout 8 hours ago
Set of models of internal diffusion models
- —
#### nvidia/Nemotron-Labs-Diffusion-8B Text Generation• 8B• Updatedabout 8 hours ago • 12.5k • 5 - —
#### nvidia/Nemotron-Labs-Diffusion-VLM-8B Image-Text-to-Text• 9B• Updatedabout 8 hours ago • 1 • 4 - —
#### nvidia/Nemotron-Labs-Diffusion-14B Text Generation• 14B• Updatedabout 8 hours ago • 11 • 14 - —
#### nvidia/Nemotron-Labs-Diffusion-3B Text Generation• 4B• Updatedabout 8 hours ago • 10.6k • 6 - —
#### nvidia/Nemotron-Labs-Diffusion-14B-Base Text Generation• 14B• Updatedabout 8 hours ago • 160 • 1 - —
#### nvidia/Nemotron-Labs-Diffusion-8B-Base Text Generation• 8B• Updatedabout 8 hours ago • 170k - —
#### nvidia/Nemotron-Labs-Diffusion-3B-Base Text Generation• 4B• Updatedabout 8 hours ago • 12.5k • 2
Similar Articles
nvidia/Nemotron-Labs-Diffusion-14B
NVIDIA releases Nemotron-Labs-Diffusion, a family of tri-mode language models (3B, 8B, 14B) supporting AR, diffusion, and self-speculation decoding, achieving 2.7x-4x speed-ups over standard AR decoding.
Nemotron-Labs-Diffusion from NVIDIA
NVIDIA released the Nemotron-Labs-Diffusion model family (3B to 14B) that supports both AR and diffusion decoding with novel self-speculation, achieving significant speedups (up to 4x) over standard AR and Eagle3 methods across hardware platforms.
@NVIDIAAI: Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion…
NVIDIA released Nemotron-Labs-Diffusion, a family of diffusion language models that generate multiple tokens in parallel, enabling faster inference and better GPU utilization, with sizes from 3B to 14B including vision-language variants.
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
NVIDIA introduces Nemotron-Labs Diffusion, a family of diffusion language models that generate text in parallel and iteratively refine it, offering faster generation and the ability to revise previous tokens.
NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents
NVIDIA announces Nemotron 3 Nano Omni, an open multimodal model that unifies vision, audio, and language processing to enable faster and more efficient AI agents, achieving up to 9x higher throughput compared to other open omni models.