Tag
Qwen-Image-Agent proposes a unified agentic framework that addresses the context gap in text-to-image generation by integrating planning, reasoning, searching, and memory mechanisms. It introduces IA-Bench for evaluation and achieves state-of-the-art performance.
Channel-wise Vector Quantization (CVQ) replaces patch-wise tokens with channel-wise tokens for image tokenization, enabling a next-channel prediction framework (CAR) that generates images by progressively refining visual details, achieving strong reconstruction and text-to-image generation performance.