Tag
This paper proposes FAIR-Calib, a two-stage post-training quantization framework for diffusion large language models that addresses the instability of token commitments during iterative refinement. It achieves state-of-the-art results on LLaDA and Dream models under low-bit quantization.
This paper proposes dMoE, a block-level mixture-of-experts framework for diffusion large language models that aggregates token-level expert distributions into block-level routing, reducing activated experts and memory usage while maintaining performance.