decoding-efficiency

#decoding-efficiency

Nemotron-Labs-Diffusion from NVIDIA

Reddit r/LocalLLaMA ↗ · 2026-05-19

NVIDIA released the Nemotron-Labs-Diffusion model family (3B to 14B) that supports both AR and diffusion decoding with novel self-speculation, achieving significant speedups (up to 4x) over standard AR and Eagle3 methods across hardware platforms.

0 favorites 0 likes

#decoding-efficiency

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

arXiv cs.CL ↗ · 2026-04-22 Cached

R²-dLLM introduces spatio-temporal redundancy reduction techniques that cut diffusion LLM decoding steps by up to 75% while preserving generation quality, addressing a key deployment bottleneck.

0 favorites 0 likes

decoding-efficiency

Nemotron-Labs-Diffusion from NVIDIA

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Submit Feedback