@AdinaYakup: Paper:
Summary
A new creator-centric benchmark for text-to-image generation, Qwen-Image-Bench, evaluates models on real-world fidelity and creative generation using a hierarchical taxonomy of 56 verifiable facets scored by a unified judge model.
View Cached Full Text
Cached at: 05/29/26, 01:46 PM
@Alibaba_Qwen Paper: https://t.co/CvVB247nCy
Paper page - Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation
Source: https://huggingface.co/papers/2605.28091 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
A new creator-centric benchmark for text-to-image generation evaluates models based on real-world fidelity and creative generation through a hierarchical taxonomy of 56 verifiable facets scored by a unified judge model trained on professional annotations.
Text-to-Image generationhas evolved from basic image synthesis into a frequently used core capability in professional creative workflows, where simple text-image alignment can no longer satisfy users’ pressing demands for faithful real-world reconstruction and genuine creative expression. Existing benchmarks, however, remain anchored in these foundational criteria and do not yet capture the nuanced capabilities that matter in authentic artistic practice, making it difficult to reliably distinguish state-of-the-art T2I models. To address the gap, we introduce Qwen-Image-Bench, acreator-centric benchmarkco-designed with professional artists and grounded in real-world creation scenarios. Qwen-Image-Bench enriches conventional evaluation with two application-driven dimensions:Real-world FidelityandCreative Generation. Drawing on the staged reasoning inherent in professional artistic workflows, we organize these five pillars into a top-downhierarchical taxonomythat further decomposes into 23 second-level sub-capabilities and 56 third-level verifiable rubrics. To ensure broad coverage, we curate 1000 stratified prompts with each prompt jointly exercising more than four fine-grained facets across multiple pillars. We train aunified judge modelQ-Judgerbased onQwen3.6-27B, supervised by 80professional annotatorsfrom global art academies underblind labelingandtriple-review protocols, that scores every image across all 56verifiable facets, producing fine-grained, rubric-grounded, and fully attributable diagnostics rather than a single opaque score. Empirically, Qwen-Image-Bench reliably distinguishes leading T2I models, achieving the greatest separation on the two application-driven dimensions ofReal-world FidelityandCreative Generationwhere existing benchmarks provide little insight, while also providing a trustworthy optimization signal for production-level T2I development.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.28091
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### Qwen/Qwen-Image-Bench Image-Text-to-Text• 27B• Updated1 day ago • 202 • 27
Datasets citing this paper1
#### Qwen/Qwen-Image-Bench Viewer• Updated1 day ago • 1k • 5.78k • 7
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.28091 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@AdinaYakup: Qwen @Alibaba_Qwen just dropped a new Text to Image benchmark + a judge model https://huggingface.co/collections/Qwen/q…
Qwen released a new Text-to-Image benchmark with 56 fine-grained evaluation facets, measuring creativity beyond prompt alignment, and includes a human-aligned judge model.
@HuggingPapers: Alibaba released Qwen-Image-Flash Few-step distillation goes beyond objectives. Data composition, teacher guidance, and…
Alibaba released Qwen-Image-Flash, a few-step distilled model for fast, high-quality text-to-image generation and instruction-guided editing, leveraging data composition, teacher guidance, and task mixture.
Qwen-Image-Flash (26 minute read)
This paper from Alibaba revisits few-step distillation for visual generative models, focusing on training recipe factors such as data composition, teacher guidance, and task mixture, using Qwen-Image-2.0 as a case study to develop Qwen-Image-Flash.
Qwen-Image-2.0 Technical Report
Qwen-Image-2.0 is a new image generation foundation model that unifies high-fidelity synthesis and precise editing using Qwen3-VL and a Multimodal Diffusion Transformer. It excels in text-rich content, multilingual typography, and photorealistic generation.
Qwen-Image-2.0 Technical Report (57 minute read)
This technical report presents Qwen-Image-2.0, a new image generation model from Alibaba's Qwen team, detailing its architecture and capabilities.