Tag
WaveFilter proposes a training-free, wavelet-guided KV cache filtering framework for diffusion large language models that enhances long-context capability by precisely identifying key tokens and constructing sparse caches, improving performance on complex long-context tasks.
This paper introduces Temporal-Spatial Parallel Decoding (TSPD) and Confidence Extrapolation (CE) to accelerate inference in diffusion-based large language models by dynamically deciding when tokens have converged and forecasting logit trends, reducing unnecessary denoising steps while preserving output quality.
This paper introduces WINO and WINO+, methods that enable revokable parallel decoding in diffusion LLMs and distill efficient denoising trajectories, significantly improving the quality-speed trade-off.
This paper introduces DARE, a method for improving the inference efficiency of Diffusion Large Language Models by reusing cached key-value and output activations to reduce computational redundancy with negligible quality loss.