Tag
GEAR proposes a method to jointly train a vector-quantized tokenizer and autoregressive generator end-to-end via representation alignment, achieving up to 10x faster convergence on ImageNet gFID compared to strong baselines.
This paper proposes Nemotron-Labs-Diffusion-Image, a masked discrete diffusion model for high-resolution text-to-image synthesis, introducing a token-editing mechanism and grouped cross-entropy objective to improve token refinement and training efficiency.
Introduces Colored Noise Sampling (CNS), a training-free stochastic solver for diffusion models that dynamically allocates energy based on frequency-dependent schedules, improving image quality metrics like FID significantly on ImageNet-256.
This paper proposes Sphere Latent Encoder, an efficient few-step image generation framework that performs denoising entirely in a spherical latent space, achieving high-quality 256×256 images with significantly reduced computational cost and improved FID scores on ImageNet-1K.