Ideogram 4 is open source! (top ranked on DesignArena)
Summary
Ideogram 4, an open-source text-to-image model, is released with state-of-the-art performance, structured JSON prompting, and multilingual text rendering.
View Cached Full Text
Cached at: 06/03/26, 05:45 PM
ideogram-ai/ideogram-4-fp8 · Hugging Face
Source: https://huggingface.co/ideogram-ai/ideogram-4-fp8
Ideogram 4: Open image model at the forefront of design

Ideogram 4 is**Ideogram’s first open-source text-to-image model**. It is astate-of-the-art foundation model trained from scratch— not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at**ideogram.ai**.
We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#table-of-contentsTable of Contents
https://huggingface.co/ideogram-ai/ideogram-4-fp8#newsNews
- **[2026-06-03]****Ideogram 4 released!**Inference code and weights are now public, and ourtechnical blog postis live. See theQuick Startsection to generate your first image, or try the model online atideogram.ai.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#model-zooModel Zoo
We plan to support more quantizations in the future.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#performancePerformance
We evaluate Ideogram 4 across third-party arenas and benchmarks, standard open-source benchmarks, and our own internal human-preference benchmark. Across all of them,Ideogram 4 is the best open-weight image model by far, and sits at the frontier of design.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#design-arenaDesign Arena
Design Arenais a third-party image Elo leaderboard focused specifically on design-oriented generation. On the overall board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary GPT and Gemini models:

Filtered to open-weight models only, Ideogram 4 leads by a commanding margin, well ahead of the next-best open model:

https://huggingface.co/ideogram-ai/ideogram-4-fp8#contralabsContraLabs
ContraLabsran a blind typography evaluation judged by ten professional designers from Contra’s top-earning talent. Ideogram 4 leads on first-place win rate, picked as the best of four models 47.9% of the time overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%, FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):

It also wins on practical usability: asked “Would you use this in real client work?”, the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):

https://huggingface.co/ideogram-ai/ideogram-4-fp8#lmarenaLMArena
OnLMArena, a third-party text-to-image leaderboard that measures general-purpose text-to-image use cases, Ideogram is the top-ranked open-weight lab and a top-5 image generation lab overall — beaten only by giant companies with vastly larger budgets and resources:

https://huggingface.co/ideogram-ai/ideogram-4-fp8#ideogram-internal-evalIdeogram internal eval
For our internal human-preference benchmark, focused on graphic design and photography, we had graphic designers deeply familiar with professional design work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall — behind only GPT Image 2 medium — and the top open-weight model:

https://huggingface.co/ideogram-ai/ideogram-4-fp8#open-source-benchmarksOpen-source benchmarks
On standard open-source benchmarks measuring core capabilities — layout control (7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering (X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the leading closed-source models across every axis. On layout control (7Bench), it is significantly better than all closed-source models:

At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight release we benchmarked — ahead of much larger models like Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):

https://huggingface.co/ideogram-ai/ideogram-4-fp8#quick-startQuick Start
https://huggingface.co/ideogram-ai/ideogram-4-fp8#installInstall
The inference code lives in theideogram4GitHub repo. Clone it, then from the repo root:
pip install .
If you plan to modify the code, install in editable mode instead so changes undersrc/ideogram4/take effect without reinstalling:
pip install -e .
https://huggingface.co/ideogram-ai/ideogram-4-fp8#cliCLI
The plain\-\-promptis rewritten into the structured JSON caption the model expects by a “magic prompt” LLM. By default this uses Ideogram’s hosted magic-prompt API, which isfreeand does the expansion server-side (no local model or system prompt needed). It readsIDEOGRAM\_API\_KEY— get a key atdeveloper.ideogram.ai:
python run_inference.py \
--prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
--output out.png \
--quantization "nf4" \
--magic-prompt-key "$IDEOGRAM_API_KEY"
You can also run the expansion through your own LLM provider — one of our magic-prompt system prompt isopen source. See thePrompting Guidefor details.
For the highest-quality images, set\-\-height 2048 \-\-width 2048and\-\-sampler\-preset V4\_QUALITY\_48.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#safety-screening-with-hiveSafety screening with Hive
Prompt and output safety screening is performed viaHive. Sign up and create a Text Moderation key and a Visual Content Moderation key, then export them asHIVE\_TEXT\_MODERATION\_KEYandHIVE\_VISUAL\_MODERATION\_KEY(or pass them via\-\-hive\-text\-key/\-\-hive\-visual\-key).
python run_inference.py \
--prompt "an isometric illustration of a tiny city floating in the clouds" \
--output out.png \
--quantization "nf4" \
--magic-prompt-key "$MAGIC_PROMPT_API_KEY" \
--hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
--hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"
For sampler presets, parameter reference, and optimization tips, seedocs/inference.md.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#model-summaryModel Summary
Ideogram 4 is afoundation model trained entirely from scratch, not a fine-tune or distillation of any existing checkpoint. It is a flow-matching text-to-image model built on afully single-streamDiffusion Transformer (DiT) architecture.
Architecture:
- **Fully single-stream DiT.**Text and image tokens are concatenated into one unified sequence and processed through the same 34-layer transformer, with no separate text or image branches. This enables deep cross-modal interaction at every layer.
- Vision-language model as text encoder.Instead of a text-only encoder like CLIP or T5, Ideogram 4 usesQwen3-VL-8B-Instruct, a full vision-language model that provides far richer understanding of visual concepts. Hidden states are extracted from13 intermediate layersand concatenated, giving the model multi-scale semantic features ranging from surface-level token information to deep compositional understanding.
- **Dual-branch classifier-free guidance.**The conditional (positive) and unconditional (negative) branches can be independently refined, enabling separate control over prompt adherence and image quality.
- **Flexible resolution.**Native support for any resolution from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1. A single model handles everything from square thumbnails to ultrawide banners, with the noise schedule auto-adjusting per resolution.
Key Capabilities:
- **Extreme controllability.**Ideogram 4 is trained on structured JSON captions, giving users unprecedented control over composition, style, lighting, color palette, typography, and spatial layout, all from a single prompt.
- **State-of-the-art text rendering.**Ideogram 4 delivers best-in-class in-image text generation (signage, logos, captions, watermarks, multi-line text) with high fidelity directly from the prompt.
- **Spatial layout control.**Bounding-box coordinates in the prompt allow explicit placement of subjects, text elements, and background regions.
- **Color palette conditioning.**Specify hex colors in the prompt to steer the image’s dominant color scheme.
For full architecture details, seedocs/model_architecture.md. For a walkthrough of how the pipeline components fit together, seedocs/pipeline.md.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#prompting-guidePrompting Guide
Ideogram 4 is trained exclusively onstructured JSON captions. While plain-text prompts work, you will get the best results by providing a JSON object that follows our caption schema.
Key points:
- Use JSON promptsfor maximum controllability — the model was trained on them and understands the structure natively.
- Color palette conditioning— specify a
colour\_palettearray of hex colors in the style description to steer the image’s color scheme. - Aspect ratio flexibility— Ideogram 4 supports a wide range of aspect ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This is a key advantage for practical use: portraits, landscapes, banners, phone wallpapers, social media formats, etc.
- Bounding-box layout— specify
bboxcoordinates in the prompt to explicitly place subjects, text elements, and background regions. - Compositional control— use
compositional\_deconstructionwith bounding boxes and per-element descriptions for precise spatial layout.
Why JSON-only training?We train exclusively on JSON so that training and inference share a single, common prompt format. The training captions themselves are deliberatelyextremely descriptive: each JSON exhaustively describes everything in the image to maximize training efficiency. The more text-to-image relationships each caption pins down, the more grounded supervision the model extracts from a single training pair, rather than having to infer those relationships across many sparsely-captioned samples.
**Why JSON at inference time?**Because the model was trained on captions that name every object explicitly, the most reliable way to get every requested object rendered is to mirror that pattern. Plain-text prompts still work, but won’t perform as well since the model was only trained on structured JSON captions.
**Don’t want to write JSON by hand?**That’s whatmagic promptis for: it uses an LLM to expand a plain-text prompt into a full structured caption before generation, so you get JSON-quality results from a casual prompt. It runs by default inrun\_inference\.py(see theCLIsection).
Seedocs/prompting.mdfor a full guide.
https://huggingface.co/ideogram-ai/ideogram-4-fp8#documentationDocumentation
DocumentDescriptiondocs/prompting.mdHow to write JSON prompts, color palette conditioning, aspect ratiosdocs/inference.mdSampler presets, parameter reference, resolutions, optimization tipsdocs/model_architecture.mdArchitecture diagram, DiT spec, component detailsdocs/pipeline.mdConceptual pipeline walkthrough — how all components fit togetherdocs/development.mdDev setup, pre-commit hooks, contributingdocs/safety.mdPre-training, post-training, and inference-time safety mitigations; how to report violations
https://huggingface.co/ideogram-ai/ideogram-4-fp8#citationCitation
If you find the provided code or models useful for your research, consider citing them as:
@misc{ideogram-4-2026,
author={Ideogram AI},
title={{Ideogram 4}},
year={2026},
howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}},
}
https://huggingface.co/ideogram-ai/ideogram-4-fp8#were-hiringWe’re Hiring!
We’re looking forResearch ScientistsandResearch Engineersto work on next-generation generative models and the products built on top of them. Interested candidates please applyhttps://jobs.ashbyhq.com/ideogram
Similar Articles
Ideogram 4 (GitHub Repo)
Ideogram 4 is an open-weight text-to-image model trained from scratch, featuring structured JSON prompting, best-in-class multilingual text rendering, bounding-box layout controls, color-palette controls, and native 2K resolution output.
ideogram-ai/ideogram-4-nf4
Ideogram has released Ideogram 4, their first open-weight text-to-image model trained from scratch, featuring state-of-the-art multilingual text rendering, JSON-structured prompting, bounding-box layout controls, and native 2K resolution output. The NF4-quantized version is available on Hugging Face, with the model claimed to be the best open-weight image model and competitive with proprietary frontier models.
Ideogram 4.0
Ideogram 4.0 is released as an open-weight model with layout control for generating design-ready images.
@ideogram_ai: Introducing Ideogram 4.0: the best open image model in the world. Think it. Make it. Own it. Download the weights, fine…
Ideogram 4.0 is released as the best open image model, available for download, fine-tuning, and on all Ideogram plans and the API.
Comfy-Org/Ideogram-4
Ideogram-4 model repackaged for ComfyUI, including fp8 scaled diffusion models, Qwen3VL text encoder, and FLUX VAE.