ideogram-ai/ideogram-4-nf4

Hugging Face Models Trending Models

Summary

Ideogram has released Ideogram 4, their first open-weight text-to-image model trained from scratch, featuring state-of-the-art multilingual text rendering, JSON-structured prompting, bounding-box layout controls, and native 2K resolution output. The NF4-quantized version is available on Hugging Face, with the model claimed to be the best open-weight image model and competitive with proprietary frontier models.

Task: text-to-image Tags: diffusers, safetensors, text-to-image, image-generation, diffusion, flow-matching, dit, ideogram, license:other, diffusers:Ideogram4Pipeline, region:us
Original Article
View Cached Full Text

Cached at: 06/05/26, 02:19 AM

ideogram-ai/ideogram-4-nf4 · Hugging Face

Source: https://huggingface.co/ideogram-ai/ideogram-4-nf4 Ideogram

Ideogram 4: Open image model at the forefront of design

Blog PostCodeModelAPIOfficial Site

A collage of Ideogram 4 samples spanning photorealism, illustration, typography, and poster design

Ideogram 4 is**Ideogram’s first open weight text-to-image model**. It is astate-of-the-art foundation model trained from scratch— not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at**ideogram.ai**.

We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#table-of-contentsTable of Contents

  1. News
  2. Model Zoo
  3. Performance
  4. Quick Start
  5. Model Summary
  6. Prompting Guide
  7. Documentation
  8. Citation

https://huggingface.co/ideogram-ai/ideogram-4-nf4#newsNews

  • **[2026-06-03]****Ideogram 4 released!**Inference code and weights are now public, and ourtechnical blog postis live. See theQuick Startsection to generate your first image, or try the model online atideogram.ai.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#model-zooModel Zoo

We plan to support more quantizations in the future.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#performancePerformance

We evaluate Ideogram 4 across third-party arenas and benchmarks, standard open-source benchmarks, and our own internal human-preference benchmark. Across all of them,Ideogram 4 is the best open-weight image model by far, and sits at the frontier of design.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#design-arenaDesign Arena

Design Arenais a third-party image Elo leaderboard focused specifically on design-oriented generation. On the overall board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary GPT and Gemini models:

Design Arena overall image Elo leaderboard with Ideogram 4.0 as the top open-weight model

Filtered to open-weight models only, Ideogram 4 leads by a commanding margin, well ahead of the next-best open model:

Design Arena open-weight image Elo leaderboard, with Ideogram 4.0 well ahead of all other open models

https://huggingface.co/ideogram-ai/ideogram-4-nf4#contralabsContraLabs

ContraLabsran a blind typography evaluation judged by ten professional designers from Contra’s top-earning talent. Ideogram 4 leads on first-place win rate, picked as the best of four models 47.9% of the time overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%, FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):

ContraLabs typography first-place win rate, with Ideogram v4 leading

It also wins on practical usability: asked “Would you use this in real client work?”, the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):

ContraLabs ‘would you use this in real client work?’ rating, with Ideogram v4 leading

https://huggingface.co/ideogram-ai/ideogram-4-nf4#lmarenaLMArena

OnLMArena, a third-party text-to-image leaderboard that measures general-purpose text-to-image use cases, Ideogram is the top-ranked open-weight lab and a top-5 image generation lab overall — beaten only by giant companies with vastly larger budgets and resources:

LMArena text-to-image lab leaderboard with Ideogram

https://huggingface.co/ideogram-ai/ideogram-4-nf4#ideogram-internal-evalIdeogram internal eval

For our internal human-preference benchmark, focused on graphic design and photography, we had graphic designers deeply familiar with professional design work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall — behind only GPT Image 2 medium — and the top open-weight model:

Ideogram internal design leaderboard with Ideogram 4.0

https://huggingface.co/ideogram-ai/ideogram-4-nf4#open-source-benchmarksOpen-source benchmarks

On standard open-source benchmarks measuring core capabilities — layout control (7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering (X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the leading closed-source models across every axis. On layout control (7Bench), it is significantly better than all closed-source models:

Five-axis capability radar comparing Ideogram 4.0 to leading closed-source models on layout control, spatial reasoning, object fidelity, prompt alignment, and text rendering

At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight release we benchmarked — ahead of much larger models like Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):

Parameter-efficiency scatter plot showing Ideogram 4.0 at 9.3B parameters leading all other open-weight models on text rendering

https://huggingface.co/ideogram-ai/ideogram-4-nf4#quick-startQuick Start

https://huggingface.co/ideogram-ai/ideogram-4-nf4#installInstall

The inference code lives in theideogram4GitHub repo. Clone it, then from the repo root:

pip install .

If you plan to modify the code, install in editable mode instead so changes undersrc/ideogram4/take effect without reinstalling:

pip install -e .

https://huggingface.co/ideogram-ai/ideogram-4-nf4#model-accessModel access

The model weights aregatedon Hugging Face, so you must accept the gate and authenticate before the code can download them — otherwise the download fails with a404/GatedRepoError.

  1. Open the model page —ideogram-ai/ideogram-4-nf4(orideogram-ai/ideogram-4-fp8) — and clickAgree and access repositoryto accept the license gate.
  2. Create a Hugging Face access token athuggingface.co/settings/tokensand log in so the download is authenticated: hf auth login Alternatively, export the token directly:export HF\_TOKEN="hf\_\.\.\.".

https://huggingface.co/ideogram-ai/ideogram-4-nf4#cliCLI

The plain\-\-promptis rewritten into the structured JSON caption the model expects by a “magic prompt” LLM. By default this uses Ideogram’s hosted magic-prompt API, which isfreeand does the expansion server-side (no local model or system prompt needed). It readsIDEOGRAM\_API\_KEY— get a key atdeveloper.ideogram.ai:

python run_inference.py \
  --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$IDEOGRAM_API_KEY"

You can also run the expansion through your own LLM provider — one of our magic-prompt system prompt isopen source. See thePrompting Guidefor details.

For the highest-quality images, set\-\-height 2048 \-\-width 2048and\-\-sampler\-preset V4\_QUALITY\_48.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#safety-screening-with-hiveSafety screening with Hive

Prompt and output safety screening is performed viaHive. Sign up and create a Text Moderation key and a Visual Content Moderation key, then export them asHIVE\_TEXT\_MODERATION\_KEYandHIVE\_VISUAL\_MODERATION\_KEY(or pass them via\-\-hive\-text\-key/\-\-hive\-visual\-key).

python run_inference.py \
  --prompt "an isometric illustration of a tiny city floating in the clouds" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$MAGIC_PROMPT_API_KEY" \
  --hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
  --hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"

For sampler presets, parameter reference, and optimization tips, seedocs/inference.md.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#diffusersdiffusers

This model is integrated with the 🧨diffuserslibrary

Install diffusers frommain

pip install git+https://github.com/huggingface/diffusers.git

https://huggingface.co/ideogram-ai/ideogram-4-nf4#diffusers—remote-prompt-upsamplingdiffusers - remote prompt upsampling

For the best possible results, use the Ideogram prompt upsampling

import json, os, torch, requests
  from diffusers import Ideogram4Pipeline

  pipe = Ideogram4Pipeline.from_pretrained(
      "ideogram-ai/ideogram-4-nf4-diffusers",
      torch_dtype=torch.bfloat16,
      token=os.environ["HF_TOKEN"], # or: token="hf_xxxxxxxxx", token is needed as the repo is gated
  ).to("cuda")

  # Expand the prompt into a structured JSON caption with Ideogram's free hosted magic-prompt API.
  # Get a key at https://developer.ideogram.ai/  (set IDEOGRAM_API_KEY).
  resp = requests.post(
      "https://api.ideogram.ai/v1/ideogram-v4/magic-prompt",
      headers={"Api-Key": "your_ideogram_api_key"},
      json={"text_prompt": "a ginger cat wearing a tiny wizard hat reading a spellbook", "aspect_ratio": "1x1"},
  ).json()
  caption = json.dumps(resp["json_prompt"])  # or: token="hf_xxxxxxxxx", token is needed as the repo is gated

  # Pass the caption straight to the pipeline (no prompt_upsampling — it's already upsampled).
  image = pipe(
      caption, 
      height=1024, # model supports up to 2048
      width=1024, # model supports up to 2048
      generator=torch.Generator("cuda").manual_seed(0),
  ).images[0]
  image.save("ideogram4.png")

https://huggingface.co/ideogram-ai/ideogram-4-nf4#diffusers—local-prompt-upsamplingdiffusers - local prompt upsampling

For a full-local experience, diffusers ships aprompt\_upsamplingthat utilizes the sameQwen3\-VL\-8Bmodel as the text-encoder for the upsampling. Expect a quality decrease compared to remote prompt uspsampling

pip install outlines #to force the json structure
import os, torch
 from diffusers import Ideogram4Pipeline, Ideogram4PromptEnhancerHead

 # The LM head that makes the (head-less) text encoder generative, loaded as a small component.
 prompt_enhancer_head = Ideogram4PromptEnhancerHead.from_pretrained(
     "diffusers/qwen3-vl-8b-instruct-lm-head",
     torch_dtype=torch.bfloat16,
 )

 pipe = Ideogram4Pipeline.from_pretrained(
     "ideogram-ai/ideogram-4-nf4-diffusers",
     prompt_enhancer_head=prompt_enhancer_head,
     torch_dtype=torch.bfloat16,
     token=os.environ["HF_TOKEN"], # or: token="hf_xxxxxxxxx" 
 ).to("cuda")

 # prompt_upsampling=True rewrites the prompt into Ideogram's structured JSON caption locally on-device
 image = pipe(
     "a ginger cat wearing a tiny wizard hat reading a spellbook",
     height=1024, # model supports up to 2048
     width=1024, # model supports up to 2048
     prompt_upsampling=True,
     generator=torch.Generator("cuda").manual_seed(0),
 ).images[0]
 image.save("ideogram4.png")

https://huggingface.co/ideogram-ai/ideogram-4-nf4#model-summaryModel Summary

Ideogram 4 is afoundation model trained entirely from scratch, not a fine-tune or distillation of any existing checkpoint. It is a flow-matching text-to-image model built on afully single-streamDiffusion Transformer (DiT) architecture.

Architecture:

  • **Fully single-stream DiT.**Text and image tokens are concatenated into one unified sequence and processed through the same 34-layer transformer, with no separate text or image branches. This enables deep cross-modal interaction at every layer.
  • Vision-language model as text encoder.Instead of a text-only encoder like CLIP or T5, Ideogram 4 usesQwen3-VL-8B-Instruct, a full vision-language model that provides far richer understanding of visual concepts. Hidden states are extracted from13 intermediate layersand concatenated, giving the model multi-scale semantic features ranging from surface-level token information to deep compositional understanding.
  • **Dual-branch classifier-free guidance.**The conditional (positive) and unconditional (negative) branches can be independently refined, enabling separate control over prompt adherence and image quality.
  • **Flexible resolution.**Native support for any resolution from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1. A single model handles everything from square thumbnails to ultrawide banners, with the noise schedule auto-adjusting per resolution.

Key Capabilities:

  • **Extreme controllability.**Ideogram 4 is trained on structured JSON captions, giving users unprecedented control over composition, style, lighting, color palette, typography, and spatial layout, all from a single prompt.
  • **State-of-the-art text rendering.**Ideogram 4 delivers best-in-class in-image text generation (signage, logos, captions, watermarks, multi-line text) with high fidelity directly from the prompt.
  • **Spatial layout control.**Bounding-box coordinates in the prompt allow explicit placement of subjects, text elements, and background regions.
  • **Color palette conditioning.**Specify hex colors in the prompt to steer the image’s dominant color scheme.

For full architecture details, seedocs/model_architecture.md. For a walkthrough of how the pipeline components fit together, seedocs/pipeline.md.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#prompting-guidePrompting Guide

Ideogram 4 is trained exclusively onstructured JSON captions. While plain-text prompts work, you will get the best results by providing a JSON object that follows our caption schema.

Key points:

  • Use JSON promptsfor maximum controllability — the model was trained on them and understands the structure natively.
  • Color palette conditioning— specify acolour\_palettearray of hex colors in the style description to steer the image’s color scheme.
  • Aspect ratio flexibility— Ideogram 4 supports a wide range of aspect ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This is a key advantage for practical use: portraits, landscapes, banners, phone wallpapers, social media formats, etc.
  • Bounding-box layout— specifybboxcoordinates in the prompt to explicitly place subjects, text elements, and background regions.
  • Compositional control— usecompositional\_deconstructionwith bounding boxes and per-element descriptions for precise spatial layout.

Why JSON-only training?We train exclusively on JSON so that training and inference share a single, common prompt format. The training captions themselves are deliberatelyextremely descriptive: each JSON exhaustively describes everything in the image to maximize training efficiency. The more text-to-image relationships each caption pins down, the more grounded supervision the model extracts from a single training pair, rather than having to infer those relationships across many sparsely-captioned samples.

**Why JSON at inference time?**Because the model was trained on captions that name every object explicitly, the most reliable way to get every requested object rendered is to mirror that pattern. Plain-text prompts still work, but won’t perform as well since the model was only trained on structured JSON captions.

**Don’t want to write JSON by hand?**That’s whatmagic promptis for: it uses an LLM to expand a plain-text prompt into a full structured caption before generation, so you get JSON-quality results from a casual prompt. It runs by default inrun\_inference\.py(see theCLIsection).

Seedocs/prompting.mdfor a full guide.

https://huggingface.co/ideogram-ai/ideogram-4-nf4#documentationDocumentation

DocumentDescriptiondocs/prompting.mdHow to write JSON prompts, color palette conditioning, aspect ratiosdocs/inference.mdSampler presets, parameter reference, resolutions, optimization tipsdocs/model_architecture.mdArchitecture diagram, DiT spec, component detailsdocs/pipeline.mdConceptual pipeline walkthrough — how all components fit togetherdocs/development.mdDev setup, pre-commit hooks, contributingdocs/safety.mdPre-training, post-training, and inference-time safety mitigations; how to report violations

https://huggingface.co/ideogram-ai/ideogram-4-nf4#citationCitation

If you find the provided code or models useful for your research, consider citing them as:

@misc{ideogram-4-2026,
    author={Ideogram AI},
    title={{Ideogram 4}},
    year={2026},
    howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}},
}

https://huggingface.co/ideogram-ai/ideogram-4-nf4#were-hiringWe’re Hiring!

We’re looking forResearch ScientistsandResearch Engineersto work on next-generation generative models and the products built on top of them. Interested candidates please applyhttps://jobs.ashbyhq.com/ideogram

Similar Articles

Ideogram 4 (GitHub Repo)

TLDR AI

Ideogram 4 is an open-weight text-to-image model trained from scratch, featuring structured JSON prompting, best-in-class multilingual text rendering, bounding-box layout controls, color-palette controls, and native 2K resolution output.

Ideogram 4.0

Product Hunt

Ideogram 4.0 is released as an open-weight model with layout control for generating design-ready images.

Comfy-Org/Ideogram-4

Hugging Face Models Trending

Ideogram-4 model repackaged for ComfyUI, including fp8 scaled diffusion models, Qwen3VL text encoder, and FLUX VAE.