@AdinaYakup: Paper:

X AI KOLs Following Papers

Summary

A new creator-centric benchmark for text-to-image generation, Qwen-Image-Bench, evaluates models on real-world fidelity and creative generation using a hierarchical taxonomy of 56 verifiable facets scored by a unified judge model.

@Alibaba_Qwen Paper: https://t.co/CvVB247nCy
Original Article
View Cached Full Text

Cached at: 05/29/26, 01:46 PM

@Alibaba_Qwen Paper: https://t.co/CvVB247nCy


Paper page - Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

Source: https://huggingface.co/papers/2605.28091 Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

A new creator-centric benchmark for text-to-image generation evaluates models based on real-world fidelity and creative generation through a hierarchical taxonomy of 56 verifiable facets scored by a unified judge model trained on professional annotations.

Text-to-Image generationhas evolved from basic image synthesis into a frequently used core capability in professional creative workflows, where simple text-image alignment can no longer satisfy users’ pressing demands for faithful real-world reconstruction and genuine creative expression. Existing benchmarks, however, remain anchored in these foundational criteria and do not yet capture the nuanced capabilities that matter in authentic artistic practice, making it difficult to reliably distinguish state-of-the-art T2I models. To address the gap, we introduce Qwen-Image-Bench, acreator-centric benchmarkco-designed with professional artists and grounded in real-world creation scenarios. Qwen-Image-Bench enriches conventional evaluation with two application-driven dimensions:Real-world FidelityandCreative Generation. Drawing on the staged reasoning inherent in professional artistic workflows, we organize these five pillars into a top-downhierarchical taxonomythat further decomposes into 23 second-level sub-capabilities and 56 third-level verifiable rubrics. To ensure broad coverage, we curate 1000 stratified prompts with each prompt jointly exercising more than four fine-grained facets across multiple pillars. We train aunified judge modelQ-Judgerbased onQwen3.6-27B, supervised by 80professional annotatorsfrom global art academies underblind labelingandtriple-review protocols, that scores every image across all 56verifiable facets, producing fine-grained, rubric-grounded, and fully attributable diagnostics rather than a single opaque score. Empirically, Qwen-Image-Bench reliably distinguishes leading T2I models, achieving the greatest separation on the two application-driven dimensions ofReal-world FidelityandCreative Generationwhere existing benchmarks provide little insight, while also providing a trustworthy optimization signal for production-level T2I development.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.28091

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Qwen/Qwen-Image-Bench Image-Text-to-Text• 27B• Updated1 day ago • 202 • 27

Datasets citing this paper1

#### Qwen/Qwen-Image-Bench Viewer• Updated1 day ago • 1k • 5.78k • 7

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.28091 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Qwen-Image-Flash (26 minute read)

TLDR AI

This paper from Alibaba revisits few-step distillation for visual generative models, focusing on training recipe factors such as data composition, teacher guidance, and task mixture, using Qwen-Image-2.0 as a case study to develop Qwen-Image-Flash.

Qwen-Image-2.0 Technical Report

Hugging Face Daily Papers

Qwen-Image-2.0 is a new image generation foundation model that unifies high-fidelity synthesis and precise editing using Qwen3-VL and a Multimodal Diffusion Transformer. It excels in text-rich content, multilingual typography, and photorealistic generation.