unsloth/ERNIE-Image-Turbo-GGUF
摘要
unsloth 发布了基于百度的 ERNIE-Image-Turbo 模型的 GGUF 量化版本,采用 Unsloth Dynamic 2.0 方法,能够在配备 24GB 显存的消费级 GPU 上通过 8 步推理高效实现文生图。
查看缓存全文
缓存时间: 2026/04/20 14:45
unsloth/ERNIE-Image-Turbo-GGUF · Hugging Face
来源:https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF
这是 ERNIE-Image-Turbo(https://huggingface.co/baidu/ERNIE-Image-Turbo)的 GGUF 量化版本。unsloth/ERNIE-Image-Turbo-GGUF 采用 Unsloth Dynamic 2.0(https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)方法,实现最先进的性能。
- 重要层会被提升到更高精度。
- 使用了 city96 的
ComfyUI-GGUF(https://github.com/city96/ComfyUI-GGUF)工具。
🤗 ERNIE-Image (https://huggingface.co/Baidu/ERNIE-Image) | 🤗 ERNIE-Image-Turbo (https://huggingface.co/Baidu/ERNIE-Image-Turbo) | 🖥️ Huggingface Demo (https://huggingface.co/spaces/baidu/ERNIE-Image) | 🖥️ AI Studio Demo (https://aistudio.baidu.com/ernieimage) | 📖 Blog (https://yiyan.baidu.com/blog/posts/ernie-image) | 🖼️ Art Gallery (https://ernieimageprompt.com/) | 💬 WeChat(微信)(https://github.com/baidu/ERNIE-Image/blob/main/assets/contacts/WeChat_small.jpg) | 🫨 Discord (https://discord.gg/ByUTbjfG5k) | 🏷️ X (https://x.com/ErnieforDevs)
ERNIE-Image-Turbo 是百度 ERNIE-Image 团队开发的开源文本到图像生成模型。它是 ERNIE-Image 的蒸馏版本,基于相同的单流扩散 Transformer(DiT)系列,旨在仅用 8 步推理实现快速生成和高保真度。在需要精确内容实现和美学效果并重的实际生成场景中,该模型保持了强大的可控性。特别是,ERNIE-Image-Turbo 在复杂指令跟随、文本渲染和结构化图像生成方面表现出色,非常适合海报、漫画、多面板布局以及其他既要求视觉质量又要求效率的内容创作任务。它还支持广泛的视觉风格,包括真实摄影、设计导向图像和风格化美学输出。
ERNIE-Image 马赛克
亮点:
- 快速高效:作为 ERNIE-Image 的蒸馏检查点,ERNIE-Image-Turbo 仅用 8 步推理就能提供强大的生成质量,适用于对延迟敏感的应用。
- 文本渲染:ERNIE-Image-Turbo 在密集、长文本和布局敏感的文本上表现良好,是海报、信息图、UI 类图像等文本密集型视觉内容的理想选择。
- 指令跟随:该模型能够可靠地遵循涉及多个对象、复杂关系和知识密集型描述的复杂提示。
- 结构化生成:ERNIE-Image-Turbo 在结构化视觉任务(如海报、漫画、故事板和多人面板构图)中效果显著,这些任务中布局和组织至关重要。
- 风格覆盖:除了清晰可读的设计导向输出外,该模型还支持真实摄影和独特的风格化美学,包括更柔和、更具电影感的视觉色调。
- 实际部署:得益于其紧凑的尺寸,ERNIE-Image-Turbo 可在拥有 24G 显存的消费级 GPU 上运行,降低了研究、下游使用和模型适配的门槛。
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#released-versions 发布版本
ERNIE-Image (https://huggingface.co/Baidu/ERNIE-Image):我们的 SFT 模型,通常在 50 步推理 中提供更强的通用能力和指令保真度。
ERNIE-Image-Turbo (https://huggingface.co/Baidu/ERNIE-Image-Turbo):我们的 Turbo 模型,通过 DMD 和 RL 优化,仅用 8 步推理 实现更快的速度和更高的美学质量。
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#benchmark 基准测试
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#geneval GENEval
| 模型 | 单个物体 | 两个物体 | 计数 | 颜色 | 位置 | 属性绑定 | 总体 |
|---|---|---|---|---|---|---|---|
| ERNIE-Image (w/o PE) | 1.0000 | 0.9596 | 0.7781 | 0.9282 | 0.8550 | 0.7925 | 0.8856 |
| ERNIE-Image (w/ PE) | 0.9906 | 0.9596 | 0.8187 | 0.8830 | 0.8625 | 0.7225 | 0.8728 |
| Qwen-Image | 0.9900 | 0.9200 | 0.8900 | 0.8800 | 0.7600 | 0.7700 | 0.8683 |
| ERNIE-Image-Turbo (w/o PE) | 1.0000 | 0.9621 | 0.7906 | 0.9202 | 0.7975 | 0.7300 | 0.8667 |
| ERNIE-Image-Turbo (w/ PE) | 0.9938 | 0.9419 | 0.8375 | 0.8351 | 0.7950 | 0.7025 | 0.8510 |
| FLUX.2-klein-9B | 0.9313 | 0.9571 | 0.8281 | 0.9149 | 0.7175 | 0.7400 | 0.8481 |
| Z-Image | 1.0000 | 0.9400 | 0.7800 | 0.9300 | 0.6200 | 0.7700 | 0.8400 |
| Z-Image-Turbo | 1.0000 | 0.9500 | 0.7700 | 0.8900 | 0.6500 | 0.6800 | 0.8233 |
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#oneig-en OneIG-EN
| 模型 | 对齐 | 文本 | 推理 | 风格 | 多样性 | 总体 |
|---|---|---|---|---|---|---|
| Nano Banana 2.0 | 0.8880 | 0.9440 | 0.3340 | 0.4810 | 0.2450 | 0.5780 |
| Seedream 4.5 | 0.8910 | 0.9980 | 0.3500 | 0.4340 | 0.2070 | 0.5760 |
| ERNIE-Image (w/ PE) | 0.8678 | 0.9788 | 0.3566 | 0.4309 | 0.2411 | 0.5750 |
| Seedream 4.0 | 0.8920 | 0.9830 | 0.3470 | 0.4530 | 0.1910 | 0.5730 |
| ERNIE-Image-Turbo (w/ PE) | 0.8676 | 0.9666 | 0.3537 | 0.4191 | 0.2212 | 0.5656 |
| ERNIE-Image (w/o PE) | 0.8909 | 0.9668 | 0.2950 | 0.4471 | 0.1687 | 0.5537 |
| Z-Image | 0.8810 | 0.9870 | 0.2800 | 0.3870 | 0.1940 | 0.5460 |
| Qwen-Image | 0.8820 | 0.8910 | 0.3060 | 0.4180 | 0.1970 | 0.5390 |
| ERNIE-Image-Turbo (w/o PE) | 0.8795 | 0.9488 | 0.2913 | 0.4277 | 0.1232 | 0.5341 |
| FLUX.2-klein-9B | 0.8871 | 0.8657 | 0.3117 | 0.4417 | 0.1560 | 0.5324 |
| Qwen-Image-2512 | 0.8760 | 0.9900 | 0.2920 | 0.3380 | 0.1510 | 0.5300 |
| GLM-Image | 0.8050 | 0.9690 | 0.2980 | 0.3530 | 0.2130 | 0.5280 |
| Z-Image-Turbo | 0.8400 | 0.9940 | 0.2980 | 0.3680 | 0.1390 | 0.5280 |
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#oneig-zh OneIG-ZH
| 模型 | 对齐 | 文本 | 推理 | 风格 | 多样性 | 总体 |
|---|---|---|---|---|---|---|
| Nano Banana 2.0 | 0.8430 | 0.9830 | 0.3110 | 0.4610 | 0.2360 | 0.5670 |
| ERNIE-Image (w/ PE) | 0.8299 | 0.9539 | 0.3056 | 0.4342 | 0.2478 | 0.5543 |
| Seedream 4.0 | 0.8360 | 0.9860 | 0.3040 | 0.4430 | 0.2000 | 0.5540 |
| Seedream 4.5 | 0.8320 | 0.9860 | 0.3000 | 0.4260 | 0.2130 | 0.5510 |
| Qwen-Image | 0.8250 | 0.9630 | 0.2670 | 0.4050 | 0.2790 | 0.5480 |
| ERNIE-Image-Turbo (w/ PE) | 0.8258 | 0.9386 | 0.3043 | 0.4208 | 0.2281 | 0.5435 |
| Z-Image | 0.7930 | 0.9880 | 0.2660 | 0.3860 | 0.2430 | 0.5350 |
| ERNIE-Image (w/o PE) | 0.8421 | 0.8979 | 0.2656 | 0.4212 | 0.1772 | 0.5208 |
| Qwen-Image-2512 | 0.8230 | 0.9830 | 0.2720 | 0.3420 | 0.1570 | 0.5150 |
| GLM-Image | 0.7380 | 0.9760 | 0.2840 | 0.3350 | 0.2210 | 0.5110 |
| Z-Image-Turbo | 0.7820 | 0.9820 | 0.2760 | 0.3610 | 0.1340 | 0.5070 |
| ERNIE-Image-Turbo (w/o PE) | 0.8326 | 0.9086 | 0.2580 | 0.4002 | 0.1316 | 0.5062 |
| FLUX.2-klein-9B | 0.8201 | 0.4920 | 0.2599 | 0.4166 | 0.1625 | 0.4302 |
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#longtextbench LongTextBench
| 模型 | LongText-Bench-EN | LongText-Bench-ZH | 平均 |
|---|---|---|---|
| Seedream 4.5 | 0.9890 | 0.9873 | 0.9882 |
| ERNIE-Image (w/ PE) | 0.9804 | 0.9661 | 0.9733 |
| GLM-Image | 0.9524 | 0.9788 | 0.9656 |
| ERNIE-Image-Turbo (w/ PE) | 0.9675 | 0.9636 | 0.9655 |
| Nano Banana 2.0 | 0.9808 | 0.9491 | 0.9650 |
| ERNIE-Image-Turbo (w/o PE) | 0.9602 | 0.9675 | 0.9639 |
| ERNIE-Image (w/o PE) | 0.9679 | 0.9594 | 0.9636 |
| Qwen-Image-2512 | 0.9561 | 0.9647 | 0.9604 |
| Qwen-Image | 0.9430 | 0.9460 | 0.9445 |
| Z-Image | 0.9350 | 0.9360 | 0.9355 |
| Seedream 4.0 | 0.9214 | 0.9261 | 0.9238 |
| Z-Image-Turbo | 0.9170 | 0.9260 | 0.9215 |
| FLUX.2-klein-9B | 0.8642 | 0.2183 | 0.5413 |
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#quick-start 快速开始
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#recommended-parameters 推荐参数
- 分辨率:- 1024x1024 - 848x1264 - 1264x848 - 768x1376 - 896x1200 - 1376x768 - 1200x896
- 引导尺度:1.0
- 推理步数:8
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#diffusers Diffusers
安装最新版本的 diffusers:
pip install git+https://github.com/huggingface/diffusers
`` import torch from diffusers import ErnieImagePipeline
pipe = ErnieImagePipeline.from_pretrained( “Baidu/ERNIE-Image-Turbo”, torch_dtype=torch.bfloat16, ).to(“cuda”)
image = pipe( prompt=“This is a photograph depicting an urban street scene. Shot at eye level, it shows a covered pedestrian or commercial street. Slightly below the center of the frame, a cyclist rides away from the camera toward the background, appearing as a dark silhouette against backlighting with indistinct details. The ground is paved with regular square tiles, bisected by a prominent tactile paving strip running through the scene, whose raised textures are clearly visible under the light. Light streams in diagonally from the right side of the frame, creating a strong backlight effect with a distinct Tyndall effect—visible light beams illuminating dust or vapor in the air and casting long shadows across the street. Several pedestrians appear on the left side and in the distance, some with their backs to the camera and others walking sideways, all rendered as silhouettes or semi-silhouettes. The overall color palette is warm, dominated by golden yellows and dark browns, evoking the atmosphere of dusk or early morning.”, height=1264, width=848, num_inference_steps=8, guidance_scale=1.0, use_pe=True # 使用提示增强器 ).images[0]
image.save(“output.png”) ``
https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF#sglang SGLang
安装最新版本的 sglang:
git clone https://github.com/sgl-project/sglang.git
启动服务器:
sglang serve --model-path baidu/ERNIE-Image-Turbo
发送生成请求:
curl -X POST http://localhost:30000/generate \ -H "Content-Type: application/json" \ -d '{ "prompt": "This is a photograph depicting an urban street scene. Shot at eye level, it shows a covered pedestrian or commercial street. Slightly below the center of the frame, a cyclist rides away from the camera toward the background, appearing as a dark silhouette against backlighting with indistinct details. The ground is paved with regular square tiles, bisected by a prominent tactile paving strip running through the scene, whose raised textures are clearly visible under the light. Light streams in diagonally from the right side of the frame, creating a strong backlight effect with a distinct Tyndall effect—visible light beams illuminating dust or vapor in the air and casting long shadows across the street. Several pedestrians appear on the left side and in the distance, some with their backs to the camera and others walking sideways, all rendered as silhouettes or semi-silhouettes. The overall color palette is warm, dominated by golden yellows and dark browns, evoking the atmosphere of dusk or early morning.", "height": 1264, "width": 848, "num_inference_steps": 8, "guidance_scale": 1.0, "use_pe": true }' \ --output output.png
相似文章
baidu/ERNIE-Image-Turbo
百度发布了ERNIE-Image-Turbo,一个蒸馏文本到图像生成模型,可在8步推理中实现快速生成,同时保持强大的文本渲染、指令遵循和结构化图像生成能力。
baidu/ERNIE-Image
百度发布ERNIE-Image,这是一个基于扩散Transformer架构、拥有8B参数的开源权重文本到图像生成模型。它在开源模型中达到了最先进的性能,在文本渲染、指令跟随和结构化图像生成方面表现出色。
prunaai/z-image-turbo
阿里巴巴60亿参数的Z-Image-Turbo文生图模型,经PrunaAI进一步压缩,可在8步扩散下于1秒内生成1024×1024双语文字照片级图像。
unsloth/gemma-4-26B-A4B-it-GGUF
# unsloth/gemma-4-26B-A4B-it-GGUF · Hugging Face 来源:[https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF](https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF) ## [https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF#read-our-how-to-run-gemma-4-guide](https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF#read-our-how-to-run-gemma-4-guide)阅读我们的[如何运行 Gemma 4 指南](https://docs.unsloth.ai/models/gemma-4)! *请参阅[Unsloth Dynamic 2.0 GGUFs](https://unsloth.ai/docs/basics/unslot
最强本地AI图像生成器来了!
Ernie Image,全新开源扩散模型,文字渲染与提示词忠实度全面超越Zage,可在ComfyUI本地运行,仅需约20 GB显存。