Tag
This paper introduces BitLM, a language model that uses bitwise continuous diffusion to generate multiple tokens in parallel, aiming to overcome the sequential bottleneck of traditional autoregressive generation while preserving causal structure.