Tag
TIDE is a lossless inference system for diffusion large language models that leverages temporal stability of expert activations to reduce I/O overhead and computation, achieving up to 1.4-1.5x throughput improvements on single GPU-CPU systems.
This paper introduces Parallel Speculative Decoding (PSD), a training-free framework that accelerates diffusion LLM inference by jointly improving spatial and temporal efficiency, achieving up to 5.5× tokens per forward pass with comparable quality to greedy decoding.
dLLM is an open-source library that converts any autoregressive LLM into a diffusion LLM, enabling parallel decoding and faster text generation.
This article questions why major LLM providers are not investing in Diffusion LLMs despite recent advancements like Mercury 2. It explores potential fundamental issues or hardware bottlenecks hindering broader adoption.
R²-dLLM introduces spatio-temporal redundancy reduction techniques that cut diffusion LLM decoding steps by up to 75% while preserving generation quality, addressing a key deployment bottleneck.