HiDream-ai/HiDream-O1-Image

Hugging Face Models Trending 2026/05/08 13:06 模型

image-generation open-source text-to-image image-editing multimodal foundation-model high-resolution

摘要

HiDream-ai 已开源 HiDream-O1-Image（8B），这是一款基于像素级统一 Transformer（UiT）构建的统一图像生成基础模型，原生支持文本生成图像、图像编辑以及主体驱动的个性化生成，分辨率最高可达 2048×2048，无需外部 VAE 或独立文本编码器。该模型在 Artificial Analysis 文生图竞技场中首次亮相即位列第 8，是目前领先的开放权重文生图模型之一。

任务：图文生成图像标签：transformers、safetensors、qwen3_vl、图文生成文本、图文生成图像、许可证：mit、endpoints_compatible、region:us

查看原文

查看缓存全文

缓存时间: 2026/05/09 12:31

HiDream-ai/HiDream-O1-Image · Hugging Face

来源：https://huggingface.co/HiDream-ai/HiDream-O1-Image

HiDream-O1-Image 是一个原生统一图像生成基础模型，基于像素级统一 Transformer（UiT）构建，无需外部 VAE 或独立文本编码器。它在单一共享 token 空间中原生编码原始像素、文本和任务特定条件，支持文本生成图像、图像编辑和主体驱动个性化，分辨率最高可达 2,048 × 2,048。

HiDream-O1-Image（代号：Peanut）在 Artificial Analysis 文本生成图像竞技场中首发排名第 8，有望成为新的领先开源文本生成图像模型（2026-5-5）。

Artificial Analysis 文本生成图像竞技场 Artificial Analysis 文本生成图像竞技场 分辨率最高可达 2,048 × 2,048。

通用文本生成图像 通用文本生成图像 分辨率最高可达 2,048 × 2,048。

长文本渲染与排版 长文本渲染与排版控制 — 精准、多区域、多语言文字渲染。

主体驱动个性化 主体驱动个性化 — 在新场景中保留身份/IP。

https://huggingface.co/HiDream-ai/HiDream-O1-Image#project-updates项目动态

🚀2026 年 5 月 8 日： 我们已开源 HiDream-O1-Image（8B），包括未蒸馏版和蒸馏 Dev 变体，以及推理驱动提示词智能体。

https://huggingface.co/HiDream-ai/HiDream-O1-Image#key-features核心特性

🧬像素级统一 Transformer — 端到端处理原始像素的单一模型，无 VAE，无独立文本编码器。
🎨一个模型，多种任务 — 单一架构支持文本生成图像、长文本渲染、指令编辑、主体驱动个性化和分镜生成。
🧠推理驱动提示词智能体 — 内置“思考“智能体，在生成前解析隐含知识、排版和文字渲染需求。
🖼️原生高分辨率 — 直接合成最高 2,048 × 2,048 的图像，细节清晰精细。
⚡8B 规模下的卓越效率与通用性 — 仅凭 80 亿参数，性能与更大的开源 DiT 模型及领先的闭源模型持平甚至超越。

https://huggingface.co/HiDream-ai/HiDream-O1-Image#models模型

https://huggingface.co/HiDream-ai/HiDream-O1-Image#evaluation评估

我们在五个广泛使用的评测套件上对 HiDream-O1-Image 与最先进的开源和闭源模型进行了对比测试，涵盖组合生成、密集提示对齐、人类偏好、复杂视觉文字生成和长文本渲染。每张表格中，最佳结果以粗体标注，次优结果加下划线标注。点击任意基准测试可展开或收起。

GenEval — 组合生成

模型	参数量	单一对象	双对象	计数	颜色	位置	属性	总分
Nano Banana 2.0	–	1.00	0.96	0.71	0.84	0.86	0.65	0.83
Seedream-4.0	–	1.00	0.92	0.71	0.93	0.78	0.68	0.84
GPT Image 1 [High]	–	0.99	0.92	0.85	0.92	0.75	0.61	0.84
GPT Image 2	–	0.99	0.98	0.85	0.93	0.85	0.77	0.89
PixArt	4.3B + 0.6B	0.98	0.50	0.44	0.80	0.08	0.07	0.48
Show-o	1.3B	0.95	0.52	0.49	0.82	0.11	0.28	0.53
Emu3-Gen	8B	0.98	0.71	0.34	0.81	0.17	0.21	0.54
SD3-Medium	5.5B + 2B	0.98	0.74	0.63	0.67	0.34	0.36	0.62
JanusFlow	1.3B	0.97	0.59	0.45	0.83	0.53	0.42	0.63
FLUX.1 [Dev]	4.8B + 12B	0.98	0.81	0.74	0.79	0.22	0.45	0.66
SD3.5 Large	5.5B + 8.1B	0.98	0.89	0.73	0.83	0.34	0.47	0.71
Janus-Pro-7B	7B	0.99	0.89	0.59	0.90	0.79	0.66	0.80
Z-Image-Turbo	4B + 6B	1.00	0.95	0.77	0.89	0.65	0.68	0.82
FLUX.2 [Dev]	24B + 32B	1.00	0.99	0.79	0.93	0.73	0.78	0.87
Qwen-Image	7B + 20B	0.99	0.92	0.89	0.88	0.76	0.77	0.87
HiDream-O1-Image	8B	1.00	0.99	0.79	0.89	0.93	0.78	0.90
HiDream-O1-Image-Pro	200B+	1.00	0.99	0.85	0.94	0.94	0.79	0.92

DPG-Bench — 密集提示对齐

模型	参数量	全局	实体	属性	关系	其他	总分
GPT Image 1 [High]	–	88.89	88.94	89.84	92.63	90.96	85.15
GPT Image 2	–	87.27	91.91	90.85	91.59	91.58	85.98
Nano Banana 2.0	–	85.17	92.55	91.16	90.45	91.08	86.90
Seedream-4.0	–	87.17	92.41	92.29	93.33	95.48	88.63
SD v1.5	0.12B + 0.86B	74.63	74.23	75.39	73.49	67.81	63.18
PixArt	4.3B + 0.6B	74.97	79.32	78.60	82.57	76.96	71.11
Lumina-Next	2B + 2B	82.82	88.65	86.44	80.53	81.82	74.63
SDXL	0.81B + 2.6B	83.27	82.43	80.91	86.76	80.41	74.65
Hunyuan-DiT	4.8B + 1.5B	84.59	80.59	88.01	74.36	86.41	78.87
Emu3-Gen	8B	85.21	86.68	86.84	90.22	83.15	80.60
DALL-E 3	–	90.97	89.61	88.39	90.58	89.83	83.50
FLUX.1 [Dev]	4.8B + 12B	74.35	90.00	88.96	90.87	88.33	83.84
SD3 Medium	5.5B + 2B	87.90	91.01	88.83	80.70	88.68	84.08
Janus-Pro-7B	7B	86.90	88.90	89.40	89.32	89.48	84.19
Z-Image-Turbo	4B + 6B	91.29	89.59	90.14	92.16	88.68	84.86
HiDream-I1-Full	13.5B + 17B	76.44	90.22	89.48	93.74	91.83	85.89
FLUX.2 [Dev]	24B + 32B	92.20	91.36	93.28	93.52	89.72	87.57
Qwen-Image	7B + 20B	91.32	91.56	92.02	94.31	92.73	88.32
HiDream-O1-Image	8B	95.15	92.32	93.74	92.88	90.25	89.83
HiDream-O1-Image-Pro	200B+	94.97	95.42	92.59	90.82	89.50	90.30

HPSv3 — 12 类别人类偏好

模型	参数量	全部	角色	艺术	设计	建筑	动物	自然风景	交通工具	产品	植物	食物	科学	其他
Seedream-4.0	–	9.32	9.83	9.20	8.83	9.95	8.99	9.40	9.58	9.12	9.26	9.75	9.11	9.51
Nano Banana 2.0	–	10.01	10.18	9.18	9.58	10.96	9.71	10.04	10.38	10.36	10.14	10.61	9.14	9.89
GPT Image 2	–	10.21	10.75	9.91	10.15	10.59	10.05	10.29	10.17	10.26	10.07	10.75	10.05	10.00
Z-Image-Turbo	4B + 6B	8.35	8.98	8.29	7.65	9.26	8.51	8.33	8.81	7.83	8.46	8.64	7.93	8.57
FLUX.2 [Dev]	24B + 32B	9.28	10.23	9.56	8.80	9.73	9.43	9.21	9.44	8.93	9.23	9.82	8.67	9.11
Qwen-Image	7B + 20B	9.94	10.91	10.47	9.56	10.22	10.61	9.87	10.10	9.15	9.99	10.08	9.19	9.83
HiDream-O1-Image	8B	10.37	10.59	10.44	10.29	11.02	10.34	10.37	10.54	10.50	10.38	10.85	9.68	10.09
HiDream-O1-Image-Pro	200B+	10.47	10.63	10.51	10.33	11.11	10.08	10.45	10.37	10.75	10.29	11.13	10.09	10.39

CVTG-2K — 复杂视觉文字生成（点击展开）

模型	参数量	2 区域	3 区域	4 区域	5 区域	平均	NED	CLIP 分数
Nano Banana 2.0	–	0.7465	0.7720	0.8067	0.7980	0.7875	0.8945	0.7212
GPT Image 1 [High]	–	0.8779	0.8659	0.8731	0.8218	0.8569	0.9478	0.7982
Seedream-4.0	–	0.8980	0.8949	0.9044	0.9015	0.9003	0.9511	0.8033
GPT Image 2	–	0.8904	0.8887	0.9101	0.9044	0.9003	0.9515	0.7798
TextDiffuser-2	0.12B + 0.9B	0.5322	0.3255	0.1787	0.0809	0.2326	0.4353	0.6765
RAG-Diffusion	4.8B + 12B	0.4388	0.3316	0.2116	0.1910	0.2648	0.4498	0.7797
AnyText	0.123B + 1.2B	0.0513	0.1739	0.1948	0.2249	0.1804	0.4675	0.7432
3DIS	0.81B + 2.6B	0.4495	0.3959	0.3880	0.3303	0.3813	0.6505	0.7767
FLUX.1 [Dev]	4.8B + 12B	0.6089	0.5531	0.4661	0.4316	0.4965	0.6879	0.7401
SD3.5 Large	5.5B + 8.1B	0.7293	0.6825	0.6574	0.5940	0.6548	0.8470	0.7797
TextCrafter	7B + 20B	0.7628	0.7628	0.7406	0.6977	0.7370	0.8679	0.7868
Qwen-Image	7B + 20B	0.8370	0.8364	0.8313	0.8158	0.8288	0.9116	0.8017
Z-Image-Turbo	4B + 6B	0.8872	0.8662	0.8628	0.8347	0.8585	0.9281	0.8048
FLUX.2 [Dev]	24B + 32B	0.9261	0.8897	0.8995	0.8732	0.8926	0.9475	0.8104
HiDream-O1-Image	8B	0.9085	0.9159	0.9216	0.9015	0.9128	0.9561	0.8076
HiDream-O1-Image-Pro	200B+	0.9133	0.9221	0.9365	0.9175	0.9222	0.9628	0.8349

LongText-Bench — 长文本渲染，英文 & 中文（点击展开）

模型	参数量	LongText-Bench-EN	LongText-Bench-ZH
Seedream-4.0	–	0.936	0.946
GPT Image 1 [High]	–	0.956	0.619
GPT Image 2	–	0.960	0.961
Nano Banana 2.0	–	0.980	0.965
Janus-Pro-7B	7B	0.019	0.006
BLIP3-o	7B + 1.4B	0.021	0.018
Kolors 2.0	–	0.258	0.329
BAGEL	7B + 7B	0.373	0.310
OmniGen2	3B + 4B	0.561	0.059
X-Omni	7B	0.900	0.814
HiDream-I1-Full	13.5B + 17B	0.543	0.024
FLUX.1 [Dev]	4.8B + 12B	0.607	0.005
Z-Image-Turbo	4B + 6B	0.917	0.926
FLUX.2 [Dev]	24B + 32B	0.963	0.757
Qwen-Image	7B + 20B	0.943	0.946
HiDream-O1-Image	8B	0.979	0.978
HiDream-O1-Image-Pro	200B+	0.982	0.980

https://huggingface.co/HiDream-ai/HiDream-O1-Image#installation安装

克隆本仓库：

git clone https://github.com/HiDream-ai/HiDream-O1-Image.git cd HiDream-O1-Image

安装所需依赖：

pip install -r requirements.txt

关于 flash\-attn 的说明。 我们强烈推荐安装 flash\-attn（https://github.com/Dao-AILab/flash-attention）以优化注意力计算。如果您未安装（或无法安装）flash\-attn，必须编辑 models/pipeline\.py 第 291 行，将 "use\_flash\_attn": True 改为 "use\_flash\_attn": False — 否则推理时将因无法导入内核而失败。

https://huggingface.co/HiDream-ai/HiDream-O1-Image#reasoning-driven-prompt-agent推理驱动提示词智能体

HiDream-O1-Image 内置了推理驱动提示词智能体（prompt\_agent\.py），能够显式推理排版、主体属性、物理逻辑和文字渲染细节，然后将原始用户指令改写为自洽的英文提示词。它支持两种后端 — 通过 \-\-backend 参数选择。

该智能体会输出一个包含三个字段的 JSON 对象：prompt（改写后的英文提示词）、reasoning（推理过程）和 resolved\_knowledge（已解析的知识）。将 prompt 字段输入 inference\.py，可在复杂的重推理请求上获得最佳效果。

https://huggingface.co/HiDream-ai/HiDream-O1-Image#option-a–local-backend-gemma-4-31b-it方案 A — 本地后端（Gemma-4-31B-it）

下载 Gemma 权重（需在 HuggingFace 上接受 Gemma 许可协议）：

huggingface-cli download google/gemma-4-31B-it --local-dir /path/to/gemma-4-31B-it

在本地运行改写器：

python prompt_agent.py \ --backend local \ --model_id /path/to/gemma-4-31B-it \ --prompt "李白的静夜思写在古墙上"

https://huggingface.co/HiDream-ai/HiDream-O1-Image#option-b–external-openai-compatible-api方案 B — 外部 OpenAI 兼容 API

通过提供 \-\-base\_url、\-\-api\_key 和 \-\-model\_name，可使用任意 OpenAI 兼容的端点（OpenAI、Azure、vLLM、SGLang、DeepSeek 等）：

python prompt_agent.py \ --backend api \ --base_url https://api.openai.com/v1 \ --api_key $OPENAI_API_KEY \ --model_name deepseek-v4-pro \ --prompt "李白的静夜思写在古墙上"

https://huggingface.co/HiDream-ai/HiDream-O1-Image#usage使用方法

推理需要支持 CUDA 的 GPU。以下示例使用未蒸馏模型（\-\-model\_type full）；最后一小节介绍如何使用蒸馏模型（\-\-model\_type dev）执行相同任务。

https://huggingface.co/HiDream-ai/HiDream-O1-Image#1-text-to-image-generation1. 文本生成图像

根据文本提示词生成图像：

python inference.py \ --model_path /path/to/HiDream-O1-Image \ --prompt "medium shot, eye-level, front view. A woman is seated in an ornate bedroom, illuminated by candlelight, with a calm and composed expression. The subject is a young woman with fair skin, light brown hair styled in an updo with loose tendrils framing her face, and blue eyes. She wears a cream-colored satin robe with delicate floral embroidery and lace trim along the neckline. Her ears are adorned with pearl drop earrings. She is seated on a bed with a dark, intricately carved wooden headboard. To her left, a wooden nightstand holds three lit white candles and a candelabra with multiple lit candles in the background. The bed is covered with patterned pillows and a dark, textured blanket. The walls are paneled with dark wood and feature a large, ornate tapestry with muted earth tones. The lighting creates soft highlights on her face and robe, with warm shadows cast across the room." \ --output_image results/t2i.png \ --height 2048 \ --width 2048

https://huggingface.co/HiDream-ai/HiDream-O1-Image#2-instruction-based-image-editing2. 基于指令的图像编辑

提供单张参考图像和一条编辑指令：

python inference.py \ --model_path /path/to/HiDream-O1-Image \ --prompt "remove the earphones" \ --ref_images assets/edit/test.jpg \ --output_image results/edit.png \ --keep_original_aspect

https://huggingface.co/HiDream-ai/HiDream-O1-Image#3-multi-reference-subject-driven-personalization3. 多参考图像主体驱动个性化

提供两张或多张定义主体的参考图像，以及将其置于新场景的提示词：

python inference.py \ --model_path /path/to/HiDream-O1-Image \ --prompt "A young boy with blonde hair stands on steps wearing light blue jeans, a white t-shirt with logo, and blue and white sneakers. He wears a brown cord necklace with beads, a black wristwatch with digital display, and carries a yellow fanny pack with white zipper. In his hand is a red boxing glove with white top, a teal plastic toy car, and a plastic toy figure of Captain America. He wears a straw hat with cream band. Natural light illuminates the scene." \ --ref_images assets/IP/1.jpg assets/IP/2.jpg assets/IP/3.jpg assets/IP/4.jpg assets/IP/5.jpg assets/IP/6.jpg assets/IP/7.jpg assets/IP/8.jpg assets/IP/9.jpg assets/IP/10.jpg \ --output_image results/subject.png

https://huggingface.co/HiDream-ai/HiDream-O1-Image#4-running-with-the-dev-model4. 使用 Dev 模型运行

以上三种任务均可使用 Dev 模型运行，只需将 \-\-model\_path 切换为 Dev 检查点并设置 \-\-model\_type dev。例如：

python inference.py \ --model_path /path/to/HiDream-O1-Image-Dev \ --prompt "A dog holds a sign that says \"HiDream-O1-Image release.\"" \ --output_image results/t2i_dev.png \ --model_type dev

https://huggingface.co/HiDream-ai/HiDream-O1-Image#command-line-arguments命令行参数

\-\-model\_path：完整 HuggingFace 模型目录的路径（未蒸馏或蒸馏版本）。
\-\-prompt：生成或编辑任务的文本提示词。
\-\-ref\_images：一张或多张参考图像的路径（可选；以空格分

HiDream-ai/HiDream-O1-Image

HiDream-ai/HiDream-O1-Image · Hugging Face

https://huggingface.co/HiDream-ai/HiDream-O1-Image#project-updates项目动态

https://huggingface.co/HiDream-ai/HiDream-O1-Image#key-features核心特性

https://huggingface.co/HiDream-ai/HiDream-O1-Image#models模型

https://huggingface.co/HiDream-ai/HiDream-O1-Image#evaluation评估

https://huggingface.co/HiDream-ai/HiDream-O1-Image#installation安装

https://huggingface.co/HiDream-ai/HiDream-O1-Image#reasoning-driven-prompt-agent推理驱动提示词智能体

https://huggingface.co/HiDream-ai/HiDream-O1-Image#option-a–local-backend-gemma-4-31b-it方案 A — 本地后端（Gemma-4-31B-it）

https://huggingface.co/HiDream-ai/HiDream-O1-Image#option-b–external-openai-compatible-api方案 B — 外部 OpenAI 兼容 API

https://huggingface.co/HiDream-ai/HiDream-O1-Image#usage使用方法

https://huggingface.co/HiDream-ai/HiDream-O1-Image#1-text-to-image-generation1. 文本生成图像

https://huggingface.co/HiDream-ai/HiDream-O1-Image#2-instruction-based-image-editing2. 基于指令的图像编辑

https://huggingface.co/HiDream-ai/HiDream-O1-Image#3-multi-reference-subject-driven-personalization3. 多参考图像主体驱动个性化

https://huggingface.co/HiDream-ai/HiDream-O1-Image#4-running-with-the-dev-model4. 使用 Dev 模型运行

https://huggingface.co/HiDream-ai/HiDream-O1-Image#command-line-arguments命令行参数

相似文章

HiDream-ai/HiDream-O1-Image-Dev

baidu/ERNIE-Image

prunaai/z-image-turbo

推出 4o 图像生成功能

i1：一个简单且完全开放的强文本到图像模型配方

提交意见反馈