@PavloMolchanov: We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive…

X AI KOLs Following 05/19/26, 06:10 PM Models

tri-mode diffusion language-model nvidia open-source autoregressive efficiency

Summary

NVIDIA releases Nemotron-Labs-Diffusion, the first tri-mode language model family (3B/8B/14B) that switches between autoregressive, diffusion, and self-speculation decoding by changing the attention pattern, achieving up to 4× higher real throughput.

We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive, Diffusion, and Self-Speculation decoding by simply changing the attention pattern/mask. One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels. Up to 4× higher real throughput for a single user. HF Collection: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion…, open license Project page: https://research.nvidia.com/publication/2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive… Tech report: http://bit.ly/Nemotron-Labs-Diffusion-Report… Details below

Original Article

View Cached Full Text

Cached at: 05/20/26, 02:25 AM

One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels.

Up to 4× higher real throughput for a single user.

HF Collection: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion…, open license Project page: https://research.nvidia.com/publication/2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive… Tech report: http://bit.ly/Nemotron-Labs-Diffusion-Report…

Details below

Nemotron-Labs-Diffusion - a nvidia Collection

Source: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion updatedabout 8 hours ago

Set of models of internal diffusion models

@PavloMolchanov: We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive…

Nemotron-Labs-Diffusion - a nvidia Collection

Similar Articles

nvidia/Nemotron-Labs-Diffusion-14B

Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding

Nemotron-Labs-Diffusion from NVIDIA

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

@NVIDIAAI: Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion…

Submit Feedback

Similar Articles

nvidia/Nemotron-Labs-Diffusion-14B

Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding

Nemotron-Labs-Diffusion from NVIDIA

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

@NVIDIAAI: Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion…