Tag
Introduces TUBE, a variational upper bound on log-likelihood for discrete diffusion language models, enabling better evaluation and revealing that masked diffusion models still underperform autoregressive models.