Tag
This paper introduces Repr-Align, a method to adapt autoregressive language models into diffusion language models via representation alignment, achieving up to 4x training acceleration without retraining representations from scratch.
This paper proposes AnisoAlign, a framework that addresses the modality gap in multimodal models by applying anisotropic geometric correction to enable effective unpaired modality alignment.
This paper introduces TextLDM, a method that adapts visual latent diffusion transformers for language modeling by mapping discrete tokens to continuous latents. It demonstrates that this approach, enhanced by representation alignment, matches GPT-2 performance and unifies visual and text generation architectures.
This paper introduces UniSD, a unified self-distillation framework for adapting large language models that integrates mechanisms for supervision reliability, representation alignment, and training stability. Experimental results show that UniSD improves performance over base models and existing baselines across multiple benchmarks.
MMCORE introduces a unified multimodal image generation and editing framework that aligns VLM semantic embeddings with diffusion conditioning, achieving state-of-the-art fidelity without costly fusion or from-scratch training.