@AdinaYakup: Paper:

X AI KOLs Following 05/28/26, 02:39 PM Papers

text-to-image benchmark evaluation qwen generative-ai hierarchical-taxonomy

Summary

A new creator-centric benchmark for text-to-image generation, Qwen-Image-Bench, evaluates models on real-world fidelity and creative generation using a hierarchical taxonomy of 56 verifiable facets scored by a unified judge model.

@Alibaba_Qwen Paper: https://t.co/CvVB247nCy

Original Article

View Cached Full Text

Cached at: 05/29/26, 01:46 PM

@Alibaba_Qwen Paper: https://t.co/CvVB247nCy

Paper page - Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

Source: https://huggingface.co/papers/2605.28091 Authors:

Abstract

A new creator-centric benchmark for text-to-image generation evaluates models based on real-world fidelity and creative generation through a hierarchical taxonomy of 56 verifiable facets scored by a unified judge model trained on professional annotations.

Text-to-Image generationhas evolved from basic image synthesis into a frequently used core capability in professional creative workflows, where simple text-image alignment can no longer satisfy users’ pressing demands for faithful real-world reconstruction and genuine creative expression. Existing benchmarks, however, remain anchored in these foundational criteria and do not yet capture the nuanced capabilities that matter in authentic artistic practice, making it difficult to reliably distinguish state-of-the-art T2I models. To address the gap, we introduce Qwen-Image-Bench, acreator-centric benchmarkco-designed with professional artists and grounded in real-world creation scenarios. Qwen-Image-Bench enriches conventional evaluation with two application-driven dimensions:Real-world FidelityandCreative Generation. Drawing on the staged reasoning inherent in professional artistic workflows, we organize these five pillars into a top-downhierarchical taxonomythat further decomposes into 23 second-level sub-capabilities and 56 third-level verifiable rubrics. To ensure broad coverage, we curate 1000 stratified prompts with each prompt jointly exercising more than four fine-grained facets across multiple pillars. We train aunified judge model Q-Judgerbased onQwen3.6-27B, supervised by 80professional annotatorsfrom global art academies underblind labelingandtriple-review protocols, that scores every image across all 56verifiable facets, producing fine-grained, rubric-grounded, and fully attributable diagnostics rather than a single opaque score. Empirically, Qwen-Image-Bench reliably distinguishes leading T2I models, achieving the greatest separation on the two application-driven dimensions ofReal-world FidelityandCreative Generationwhere existing benchmarks provide little insight, while also providing a trustworthy optimization signal for production-level T2I development.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.28091

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Qwen/Qwen-Image-Bench Image-Text-to-Text• 27B• Updated1 day ago • 202 • 27

Datasets citing this paper1

#### Qwen/Qwen-Image-Bench Viewer• Updated1 day ago • 1k • 5.78k • 7

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.28091 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

@AdinaYakup: Paper:

Paper page - Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

Abstract

Models citing this paper1

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

@AdinaYakup: Qwen @Alibaba_Qwen just dropped a new Text to Image benchmark + a judge model https://huggingface.co/collections/Qwen/q…

@HuggingPapers: Alibaba released Qwen-Image-Flash Few-step distillation goes beyond objectives. Data composition, teacher guidance, and…

Qwen-Image-Flash (26 minute read)

Qwen-Image-2.0 Technical Report

Qwen-Image-2.0 Technical Report (57 minute read)

Submit Feedback

Similar Articles

@AdinaYakup: Qwen @Alibaba_Qwen just dropped a new Text to Image benchmark + a judge model https://huggingface.co/collections/Qwen/q…

@HuggingPapers: Alibaba released Qwen-Image-Flash Few-step distillation goes beyond objectives. Data composition, teacher guidance, and…

Qwen-Image-Flash (26 minute read)

Qwen-Image-2.0 Technical Report

Qwen-Image-2.0 Technical Report (57 minute read)