Tag
Introduces CLVR (Closed-Loop Visual Reasoning), a framework that reformulates text-to-image generation from a single-step process into a closed-loop, multi-step visual reasoning approach using a VLM controller and diffusion models, achieving improved performance on compositional prompts.