Tag
NVIDIA releases Nemotron-Labs-Diffusion, the first tri-mode language model family (3B/8B/14B) that switches between autoregressive, diffusion, and self-speculation decoding by changing the attention pattern, achieving up to 4× higher real throughput.