I built my 'first' flow matching image generator, here's what I learned [P]
Summary
The author shares their experience building a small flow matching image generation model trained on Apple emoji images, describing the initial failed approach and the successful pivot using RGB channels, residual blocks, and attention.
Similar Articles
@jiqizhixin: What if you could generate high-quality images in one step instead of hundreds? Stanford and ByteDance introduce W-Flow…
Stanford and ByteDance introduce W-Flow, a single-step generative model that uses Wasserstein gradient flows to achieve state-of-the-art one-step ImageNet 256x256 generation (1.29 FID) with 100x faster sampling than multi-step diffusion models.
Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching
Bootstrap Your Generator (ByG) is a framework for unpaired training of flow matching editing models, leveraging base model knowledge and gradient routing to achieve state-of-the-art results in data-scarce image and video editing tasks.
MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation
MIMFlow integrates Masked Image Modeling with Normalizing Flows for end-to-end image generation, achieving a FID of 2.50 on ImageNet 256x256 with 50% fewer tokens than standard models.
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
This paper investigates how semantic information is distributed across textual tokens in text-to-image models, finding that information concentration and cross-item interactions significantly affect image generation alignment. The authors use patching techniques to demonstrate that simple encoding-stage interventions can improve alignment quality.
Qwen-Image-Flash (26 minute read)
This paper from Alibaba revisits few-step distillation for visual generative models, focusing on training recipe factors such as data composition, teacher guidance, and task mixture, using Qwen-Image-2.0 as a case study to develop Qwen-Image-Flash.