Ideogram 4 is open source! (top ranked on DesignArena)

Reddit r/LocalLLaMA 06/03/26, 04:18 PM Models

Summary

Ideogram 4, an open-source text-to-image model, is released with state-of-the-art performance, structured JSON prompting, and multilingual text rendering.

No content available

Original Article

View Cached Full Text

Cached at: 06/03/26, 05:45 PM

ideogram-ai/ideogram-4-fp8 · Hugging Face

Source: https://huggingface.co/ideogram-ai/ideogram-4-fp8

Ideogram 4: Open image model at the forefront of design

A collage of Ideogram 4 samples spanning photorealism, illustration, typography, and poster design

Ideogram 4 is**Ideogram’s first open-source text-to-image model**. It is astate-of-the-art foundation model trained from scratch— not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at**ideogram.ai**.

We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#table-of-contentsTable of Contents

https://huggingface.co/ideogram-ai/ideogram-4-fp8#newsNews

**[2026-06-03]****Ideogram 4 released!**Inference code and weights are now public, and ourtechnical blog postis live. See theQuick Startsection to generate your first image, or try the model online atideogram.ai.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#model-zooModel Zoo

We plan to support more quantizations in the future.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#performancePerformance

We evaluate Ideogram 4 across third-party arenas and benchmarks, standard open-source benchmarks, and our own internal human-preference benchmark. Across all of them,Ideogram 4 is the best open-weight image model by far, and sits at the frontier of design.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#design-arenaDesign Arena

Design Arenais a third-party image Elo leaderboard focused specifically on design-oriented generation. On the overall board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary GPT and Gemini models:

Design Arena overall image Elo leaderboard with Ideogram 4.0 as the top open-weight model

Filtered to open-weight models only, Ideogram 4 leads by a commanding margin, well ahead of the next-best open model:

Design Arena open-weight image Elo leaderboard, with Ideogram 4.0 well ahead of all other open models

https://huggingface.co/ideogram-ai/ideogram-4-fp8#contralabsContraLabs

ContraLabsran a blind typography evaluation judged by ten professional designers from Contra’s top-earning talent. Ideogram 4 leads on first-place win rate, picked as the best of four models 47.9% of the time overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%, FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):

ContraLabs typography first-place win rate, with Ideogram v4 leading

It also wins on practical usability: asked “Would you use this in real client work?”, the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):

ContraLabs ‘would you use this in real client work?’ rating, with Ideogram v4 leading

https://huggingface.co/ideogram-ai/ideogram-4-fp8#lmarenaLMArena

OnLMArena, a third-party text-to-image leaderboard that measures general-purpose text-to-image use cases, Ideogram is the top-ranked open-weight lab and a top-5 image generation lab overall — beaten only by giant companies with vastly larger budgets and resources:

LMArena text-to-image lab leaderboard with Ideogram

https://huggingface.co/ideogram-ai/ideogram-4-fp8#ideogram-internal-evalIdeogram internal eval

For our internal human-preference benchmark, focused on graphic design and photography, we had graphic designers deeply familiar with professional design work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall — behind only GPT Image 2 medium — and the top open-weight model:

Ideogram internal design leaderboard with Ideogram 4.0

https://huggingface.co/ideogram-ai/ideogram-4-fp8#open-source-benchmarksOpen-source benchmarks

On standard open-source benchmarks measuring core capabilities — layout control (7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering (X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the leading closed-source models across every axis. On layout control (7Bench), it is significantly better than all closed-source models:

Five-axis capability radar comparing Ideogram 4.0 to leading closed-source models on layout control, spatial reasoning, object fidelity, prompt alignment, and text rendering

At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight release we benchmarked — ahead of much larger models like Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):

Parameter-efficiency scatter plot showing Ideogram 4.0 at 9.3B parameters leading all other open-weight models on text rendering

https://huggingface.co/ideogram-ai/ideogram-4-fp8#quick-startQuick Start

https://huggingface.co/ideogram-ai/ideogram-4-fp8#installInstall

The inference code lives in theideogram4GitHub repo. Clone it, then from the repo root:

pip install .

If you plan to modify the code, install in editable mode instead so changes undersrc/ideogram4/take effect without reinstalling:

pip install -e .

https://huggingface.co/ideogram-ai/ideogram-4-fp8#cliCLI

The plain\-\-promptis rewritten into the structured JSON caption the model expects by a “magic prompt” LLM. By default this uses Ideogram’s hosted magic-prompt API, which isfreeand does the expansion server-side (no local model or system prompt needed). It readsIDEOGRAM\_API\_KEY— get a key atdeveloper.ideogram.ai:

python run_inference.py \
  --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$IDEOGRAM_API_KEY"

You can also run the expansion through your own LLM provider — one of our magic-prompt system prompt isopen source. See thePrompting Guidefor details.

For the highest-quality images, set\-\-height 2048 \-\-width 2048and\-\-sampler\-preset V4\_QUALITY\_48.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#safety-screening-with-hiveSafety screening with Hive

Prompt and output safety screening is performed viaHive. Sign up and create a Text Moderation key and a Visual Content Moderation key, then export them asHIVE\_TEXT\_MODERATION\_KEYandHIVE\_VISUAL\_MODERATION\_KEY(or pass them via\-\-hive\-text\-key/\-\-hive\-visual\-key).

python run_inference.py \
  --prompt "an isometric illustration of a tiny city floating in the clouds" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$MAGIC_PROMPT_API_KEY" \
  --hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
  --hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"

For sampler presets, parameter reference, and optimization tips, seedocs/inference.md.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#model-summaryModel Summary

Ideogram 4 is afoundation model trained entirely from scratch, not a fine-tune or distillation of any existing checkpoint. It is a flow-matching text-to-image model built on afully single-streamDiffusion Transformer (DiT) architecture.

Architecture:

**Fully single-stream DiT.**Text and image tokens are concatenated into one unified sequence and processed through the same 34-layer transformer, with no separate text or image branches. This enables deep cross-modal interaction at every layer.
Vision-language model as text encoder.Instead of a text-only encoder like CLIP or T5, Ideogram 4 usesQwen3-VL-8B-Instruct, a full vision-language model that provides far richer understanding of visual concepts. Hidden states are extracted from13 intermediate layersand concatenated, giving the model multi-scale semantic features ranging from surface-level token information to deep compositional understanding.
**Dual-branch classifier-free guidance.**The conditional (positive) and unconditional (negative) branches can be independently refined, enabling separate control over prompt adherence and image quality.
**Flexible resolution.**Native support for any resolution from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1. A single model handles everything from square thumbnails to ultrawide banners, with the noise schedule auto-adjusting per resolution.

Key Capabilities:

**Extreme controllability.**Ideogram 4 is trained on structured JSON captions, giving users unprecedented control over composition, style, lighting, color palette, typography, and spatial layout, all from a single prompt.
**State-of-the-art text rendering.**Ideogram 4 delivers best-in-class in-image text generation (signage, logos, captions, watermarks, multi-line text) with high fidelity directly from the prompt.
**Spatial layout control.**Bounding-box coordinates in the prompt allow explicit placement of subjects, text elements, and background regions.
**Color palette conditioning.**Specify hex colors in the prompt to steer the image’s dominant color scheme.

For full architecture details, seedocs/model_architecture.md. For a walkthrough of how the pipeline components fit together, seedocs/pipeline.md.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#prompting-guidePrompting Guide

Ideogram 4 is trained exclusively onstructured JSON captions. While plain-text prompts work, you will get the best results by providing a JSON object that follows our caption schema.

Key points:

Use JSON promptsfor maximum controllability — the model was trained on them and understands the structure natively.
Color palette conditioning— specify acolour\_palettearray of hex colors in the style description to steer the image’s color scheme.
Aspect ratio flexibility— Ideogram 4 supports a wide range of aspect ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This is a key advantage for practical use: portraits, landscapes, banners, phone wallpapers, social media formats, etc.
Bounding-box layout— specifybboxcoordinates in the prompt to explicitly place subjects, text elements, and background regions.
Compositional control— usecompositional\_deconstructionwith bounding boxes and per-element descriptions for precise spatial layout.

Why JSON-only training?We train exclusively on JSON so that training and inference share a single, common prompt format. The training captions themselves are deliberatelyextremely descriptive: each JSON exhaustively describes everything in the image to maximize training efficiency. The more text-to-image relationships each caption pins down, the more grounded supervision the model extracts from a single training pair, rather than having to infer those relationships across many sparsely-captioned samples.

**Why JSON at inference time?**Because the model was trained on captions that name every object explicitly, the most reliable way to get every requested object rendered is to mirror that pattern. Plain-text prompts still work, but won’t perform as well since the model was only trained on structured JSON captions.

**Don’t want to write JSON by hand?**That’s whatmagic promptis for: it uses an LLM to expand a plain-text prompt into a full structured caption before generation, so you get JSON-quality results from a casual prompt. It runs by default inrun\_inference\.py(see theCLIsection).

Seedocs/prompting.mdfor a full guide.

https://huggingface.co/ideogram-ai/ideogram-4-fp8#documentationDocumentation

DocumentDescriptiondocs/prompting.mdHow to write JSON prompts, color palette conditioning, aspect ratiosdocs/inference.mdSampler presets, parameter reference, resolutions, optimization tipsdocs/model_architecture.mdArchitecture diagram, DiT spec, component detailsdocs/pipeline.mdConceptual pipeline walkthrough — how all components fit togetherdocs/development.mdDev setup, pre-commit hooks, contributingdocs/safety.mdPre-training, post-training, and inference-time safety mitigations; how to report violations

https://huggingface.co/ideogram-ai/ideogram-4-fp8#citationCitation

If you find the provided code or models useful for your research, consider citing them as:

@misc{ideogram-4-2026,
    author={Ideogram AI},
    title={{Ideogram 4}},
    year={2026},
    howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}},
}

https://huggingface.co/ideogram-ai/ideogram-4-fp8#were-hiringWe’re Hiring!

We’re looking forResearch ScientistsandResearch Engineersto work on next-generation generative models and the products built on top of them. Interested candidates please applyhttps://jobs.ashbyhq.com/ideogram