@PavloMolchanov: We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive…

X AI KOLs Following Models

Summary

NVIDIA releases Nemotron-Labs-Diffusion, the first tri-mode language model family (3B/8B/14B) that switches between autoregressive, diffusion, and self-speculation decoding by changing the attention pattern, achieving up to 4× higher real throughput.

We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive, Diffusion, and Self-Speculation decoding by simply changing the attention pattern/mask. One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels. Up to 4× higher real throughput for a single user. HF Collection: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion…, open license Project page: https://research.nvidia.com/publication/2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive… Tech report: http://bit.ly/Nemotron-Labs-Diffusion-Report… Details below
Original Article
View Cached Full Text

Cached at: 05/20/26, 02:25 AM

We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive, Diffusion, and Self-Speculation decoding by simply changing the attention pattern/mask.

One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels.

Up to 4× higher real throughput for a single user.

HF Collection: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion…, open license Project page: https://research.nvidia.com/publication/2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive… Tech report: http://bit.ly/Nemotron-Labs-Diffusion-Report…

Details below


Nemotron-Labs-Diffusion - a nvidia Collection

Source: https://huggingface.co/collections/nvidia/nemotron-labs-diffusion updatedabout 8 hours ago

Set of models of internal diffusion models

Similar Articles

nvidia/Nemotron-Labs-Diffusion-14B

Hugging Face Models Trending

NVIDIA releases Nemotron-Labs-Diffusion, a family of tri-mode language models (3B, 8B, 14B) supporting AR, diffusion, and self-speculation decoding, achieving 2.7x-4x speed-ups over standard AR decoding.

Nemotron-Labs-Diffusion from NVIDIA

Reddit r/LocalLLaMA

NVIDIA released the Nemotron-Labs-Diffusion model family (3B to 14B) that supports both AR and diffusion decoding with novel self-speculation, achieving significant speedups (up to 4x) over standard AR and Eagle3 methods across hardware platforms.