Qwen-Image-2.0 技术报告(阅读时长约57分钟)
摘要
本技术报告介绍了阿里巴巴Qwen团队推出的新图像生成模型Qwen-Image-2.0,详细阐述了其架构与能力。
Qwen团队发布了其最新的多模态图像生成模型Qwen-Image-2.0,该模型在生成和编辑任务中展现了改进的排版能力、指令遵循、照片级真实感以及长文本渲染。
查看缓存全文
缓存时间: 2026/05/14 00:11
# Qwen-Image-2.0 技术报告 来源:https://arxiv.org/abs/2605.10730 作者:Bing Zhao (https://arxiv.org/search/cs?searchtype=author&query=Zhao,+B),Chenfei Wu (https://arxiv.org/search/cs?searchtype=author&query=Wu,+C),Deqing Li (https://arxiv.org/search/cs?searchtype=author&query=Li,+D),Hao Meng (https://arxiv.org/search/cs?searchtype=author&query=Meng,+H),Jiahao Li (https://arxiv.org/search/cs?searchtype=author&query=Li,+J),Jie Zhang (https://arxiv.org/search/cs?searchtype=author&query=Zhang,+J),Jingren Zhou (https://arxiv.org/search/cs?searchtype=author&query=Zhou,+J),Junyang Lin (https://arxiv.org/search/cs?searchtype=author&query=Lin,+J),Kaiyuan Gao (https://arxiv.org/search/cs?searchtype=author&query=Gao,+K),Kuan Cao (https://arxiv.org/search/cs?searchtype=author&query=Cao,+K),Kun Yan (https://arxiv.org/search/cs?searchtype=author&query=Yan,+K),Liang Peng (https://arxiv.org/search/cs?searchtype=author&query=Peng,+L),Lihan Jiang (https://arxiv.org/search/cs?searchtype=author&query=Jiang,+L),Niantong Li (https://arxiv.org/search/cs?searchtype=author&query=Li,+N),Ningyuan Tang (https://arxiv.org/search/cs?searchtype=author&query=Tang,+N),Shengming Yin (https://arxiv.org/search/cs?searchtype=author&query=Yin,+S),Tianhe Wu (https://arxiv.org/search/cs?searchtype=author&query=Wu,+T),Xiao Xu (https://arxiv.org/search/cs?searchtype=author&query=Xu,+X),Xiaoyue Chen (https://arxiv.org/search/cs?searchtype=author&query=Chen,+X),Xihua Wang (https://arxiv.org/search/cs?searchtype=author&query=Wang,+X),Yan Shu (https://arxiv.org/search/cs?searchtype=author&query=Shu,+Y),Yanran Zhang (https://arxiv.org/search/cs?searchtype=author&query=Zhang,+Y),Yi Wang (https://arxiv.org/search/cs?searchtype=author&query=Wang,+Y),Yilei Chen (https://arxiv.org/search/cs?searchtype=author&query=Chen,+Y),Ying Ba (https://arxiv.org/search/cs?searchtype=author&query=Ba,+Y),Yixian Xu (https://arxiv.org/search/cs?searchtype=author&query=Xu,+Y),Yujia Wu (https://arxiv.org/search/cs?searchtype=author&query=Wu,+Y),Yuxiang Chen (https://arxiv.org/search/cs?searchtype=author&query=Chen,+Y),Zecheng Tang (https://arxiv.org/search/cs?searchtype=author&query=Tang,+Z),Zekai Zhang (https://arxiv.org/search/cs?searchtype=author&query=Zhang,+Z),Zhendong Wang (https://arxiv.org/search/cs?searchtype=author&query=Wang,+Z),Zihao Liu (https://arxiv.org/search/cs?searchtype=author&query=Liu,+Z),Zikai Zhou (https://arxiv.org/search/cs?searchtype=author&query=Zhou,+Z),An Yang (https://arxiv.org/search/cs?searchtype=author&query=Yang,+A),Chen Cheng (https://arxiv.org/search/cs?searchtype=author&query=Cheng,+C),Chenxu Lv (https://arxiv.org/search/cs?searchtype=author&query=Lv,+C),Dayiheng Liu (https://arxiv.org/search/cs?searchtype=author&query=Liu,+D),Fan Zhou (https://arxiv.org/search/cs?searchtype=author&query=Zhou,+F),Hantian Xiong (https://arxiv.org/search/cs?searchtype=author&query=Xiong,+H),Hongzhu Shi (https://arxiv.org/search/cs?searchtype=author&query=Shi,+H),Hu Wei (https://arxiv.org/search/cs?searchtype=author&query=Wei,+H),Huihong Zhao (https://arxiv.org/search/cs?searchtype=author&query=Zhao,+H),Ivy Liu (https://arxiv.org/search/cs?searchtype=author&query=Liu,+I),Jianwei Zhang (https://arxiv.org/search/cs?searchtype=author&query=Zhang,+J),Jiawei Zhang (https://arxiv.org/search/cs?searchtype=author&query=Zhang,+J),Kai Chen (https://arxiv.org/search/cs?searchtype=author&query=Chen,+K),Kang He (https://arxiv.org/search/cs?searchtype=author&query=He,+K),Levon Xue (https://arxiv.org/search/cs?searchtype=author&query=Xue,+L),Lin Qu (https://arxiv.org/search/cs?searchtype=author&query=Qu,+L),Linhan Tang (https://arxiv.org/search/cs?searchtype=author&query=Tang,+L),Luwen Feng (https://arxiv.org/search/cs?searchtype=author&query=Feng,+L),Minggang Wu (https://arxiv.org/search/cs?searchtype=author&query=Wu,+M),Minmin Sun (https://arxiv.org/search/cs?searchtype=author&query=Sun,+M),Na Ni (https://arxiv.org/search/cs?searchtype=author&query=Ni,+N),Rui Men (https://arxiv.org/search/cs?searchtype=author&query=Men,+R),Shuai Bai (https://arxiv.org/search/cs?searchtype=author&query=Bai,+S),Sishou Zheng (https://arxiv.org/search/cs?searchtype=author&query=Zheng,+S),Tao Lan (https://arxiv.org/search/cs?searchtype=author&query=Lan,+T),Tianqi Zhang (https://arxiv.org/search/cs?searchtype=author&query=Zhang,+T),Tingkun Wen (https://arxiv.org/search/cs?searchtype=author&query=Wen,+T),Wei Wang (https://arxiv.org/search/cs?searchtype=author&query=Wang,+W),Weixu Qiao (https://arxiv.org/search/cs?searchtype=author&query=Qiao,+W),Weiyi Lu (https://arxiv.org/search/cs?searchtype=author&query=Lu,+W),Wenmeng Zhou (https://arxiv.org/search/cs?searchtype=author&query=Zhou,+W),Xiaodong Deng (https://arxiv.org/search/cs?searchtype=author&query=Deng,+X),Xiaoxiao Xu (https://arxiv.org/search/cs?searchtype=author&query=Xu,+X),Xinlei Fang (https://arxiv.org/search/cs?searchtype=author&query=Fang,+X),Xionghui Chen (https://arxiv.org/search/cs?searchtype=author&query=Chen,+X),Yanan Wang (https://arxiv.org/search/cs?searchtype=author&query=Wang,+Y),Yang Fan (https://arxiv.org/search/cs?searchtype=author&query=Fan,+Y),Yichang Zhang (https://arxiv.org/search/cs?searchtype=author&query=Zhang,+Y),Yixuan Xu (https://arxiv.org/search/cs?searchtype=author&query=Xu,+Y),Yu Wu (https://arxiv.org/search/cs?searchtype=author&query=Wu,+Y),Zhiyuan Ma (https://arxiv.org/search/cs?searchtype=author&query=Ma,+Z),Zhizhi Cai (https://arxiv.org/search/cs?searchtype=author&query=Cai,+Z) 查看 PDF (https://arxiv.org/pdf/2605.10730) > **摘要:** 我们提出了 Qwen-Image-2.0,一个全能的图像生成基础模型,将高保真生成与精确图像编辑统一在单一框架内。尽管近期取得了进展,现有模型在处理超长文本渲染、多语言排版、高分辨率照片级真实感、稳健的指令跟随以及高效部署方面仍面临挑战,尤其是在富含文本且构图复杂的场景中。Qwen-Image-2.0 通过将 Qwen3-VL 作为条件编码器与多模态扩散 Transformer 相结合,实现联合条件-目标建模,并辅以大规模数据整理和定制化的多阶段训练流程,从而解决了这些挑战。这使得模型具备强大的多模态理解能力,同时保留灵活的生成和编辑功能。该模型支持长达 1K 个 token 的指令,用于生成幻灯片、海报、信息图和漫画等富含文本的内容,同时显著提升了多语言文字保真度和排版质量。它还增强了照片级真实感生成,带来更丰富的细节、更逼真的纹理和连贯的光照,并能在多种风格下更可靠地遵循复杂提示。广泛的人工评估表明,Qwen-Image-2.0 在生成和编辑方面均大幅优于之前的 Qwen-Image 模型,标志着向更通用、更可靠、更实用的图像生成基础模型迈出了一步。 ## 提交历史 来自:Shengming Yin [查看邮件 (https://arxiv.org/show-email/8ff770d2/2605.10730)] **\[v1\]** 2026年5月11日星期一 15:34:56 UTC (45,347 KB)
相似文章
Qwen-Image-2.0 技术报告
Qwen-Image-2.0 是一个全新的图像生成基础模型,基于 Qwen3-VL 和多模态扩散 Transformer,将高保真合成与精确编辑能力统一起来。它在富含文本的内容、多语言排版以及照片级真实感生成方面表现卓越。
Qwen-Image-Flash(26分钟阅读)
本文来自阿里巴巴,重新审视了视觉生成模型的少步蒸馏,聚焦于训练配方因素如数据组成、教师指导和任务混合,以Qwen-Image-2.0为案例研究,开发了Qwen-Image-Flash。
Qwen3.7预览版登陆Arena(1分钟阅读)
阿里巴巴Qwen宣布两大重要模型发布:Qwen3-Omni,首个原生端到端全模态AI,统一处理文本、图像、音频和视频;以及Qwen3-Next-80B-A3B,一款超高效MoE模型,每个token激活30亿参数,实现了SOTA性能,推理速度比Qwen3-32B快10倍。
@AdinaYakup: Qwen @Alibaba_Qwen 刚刚发布了一个新的文本到图像基准测试和一个评判模型 https://huggingface.co/collections/Qwen/q…
Qwen 发布了一个新的文本到图像基准测试,包含56个细粒度评估维度,衡量超越提示对齐的创造力,并包含一个与人类对齐的评判模型。
Qwen-Image-VAE-2.0 技术报告
Qwen-Image-VAE-2.0 是一个高压缩变分自编码器套件,通过增强的架构、大规模训练和语义对齐策略,提升了重建保真度和可扩散性。