Tag
This paper reveals that the low generative perplexity (Gen-PPL) reported by continuous diffusion language models like ELF is misleading, as it rewards repetition; the authors identify a one-dimensional attractor in the self-conditioning loop as the cause and propose ACE, a simple fix that subtracts this direction to reduce repetition without sacrificing quality.
This paper studies the trade-off between scarce target data and abundant generic data in mixture pretraining, finding that repetition is a key driver of performance and that mixture training tolerates 15-20 repetitions of target data. It introduces a repetition-aware scaling law to optimize mixture configurations under data constraints.