Cached at:
05/14/26, 06:40 PM
**TL;DR:** OpenAI researcher Kenji Hata and product lead Adele Li detail ImageGen 2.0 in a podcast: a comprehensive leap in text rendering, multilingual support, and photorealism, plus how users are creating everything from panoramic walkthroughs to viral "MS Paint" style content.
## From DALL-E to ImageGen 2.0: A Renaissance Leap
If DALL-E was the Stone Age, ImageGen 2.0 is the Renaissance. It’s not just artistically superior—it fuses science, art, architecture, and more into a single image. After internal review, the team confirmed it’s significantly better than ImageGen 1. Two weeks post-launch, usage grew over 50%, with more than 1.5 billion images generated weekly on ChatGPT.
## Product & Research Background
### Adele Li: From Investment to Product
Adele joined OpenAI over two years ago, having previously worked in private equity and at Redpoint Ventures for three years, investing in AI and software companies. She started on data and compute infrastructure before moving to product, focusing on ImageGen for the past six months. She sees product management as doing what needs to be done—and ImageGen let her collaborate with researchers to identify market gaps and opportunities. The market today looks completely different from when ImageGen 1.0 launched a year ago: multiple image generation tools exist, and ChatGPT itself has evolved.
### Kenji Hata: From Audio to Image
Kenji joined OpenAI about two years ago, initially working on audio projects. He gradually contributed to ImageGen 1.0 pre-launch work and eventually went full-time into image generation. He notes that during internal evaluations, early checkpoint samples compared to ImageGen 1 showed an enormous leap in photorealism—shifting from the glossy, idealized magazine-cover style to images that truly look like great photographs.
## Step-Change Improvements in Model Capabilities
### Text Rendering & Multilingual Support
ImageGen 2.0 improves across multiple dimensions:
- **Text Rendering**: Fidelity of on-screen text is dramatically better; words are meaningful and correctly spelled.
- **Multilingual Support**: Dedicated effort to support many languages, with strong reception from Asian and European users.
- **Photorealism**: Addressing feedback that previous models didn’t look real enough or altered faces/bodies, the goal was to make images feel more like the users themselves.
These capabilities come from the model absorbing world knowledge and being able to reflect it back visually to users.
### Variable Binding & Object Counting
From DALL-E 3 to GPT Image 1, the number of random objects in a grid jumped from ~5-8 to ~16; Image 1.5 consistently hit 25-36; ImageGen 2.0 can easily surpass 100. An internal standard test: ask GPT to list 100 random objects, pass them to the image generator—it gets nearly all of them right.
### Emergent Capabilities: 360° Panoramas
The model can render images at any aspect ratio, leading people to create extremely long, stunning panoramic views and slender bookmarks. With 360° style rendering, users can explore these images in a 360° world. This feature is integrated into ChatGPT web and mobile versions.
## User Use Cases & Viral Trends
### Productivity & Creativity Side by Side
Image generation was once seen as purely entertainment or non-productive, but now real productivity gains are visible—infographics, greatly improved text quality, and more productive use cases. People use the model to make fun memes, images for five-year-olds, professional consulting presentations, and to turn popular photos into rough MS Paint versions. Creating imperfect things actually requires high intelligence—users value authenticity, imperfection, and nostalgia.
### New Forms of Self-Expression
Self-expression through AI is an area the team is very excited about. The model’s understanding of aesthetic beauty shines across outputs, greatly expanding the range of possible outputs—many use cases exceeded the team’s expectations.
## Model Efficiency & Post-Training
### Speed & Token Efficiency
From the DALL-E era (“tell us what you want and check back in an hour”) to real-time generation in ChatGPT, the team has learned with each release how to produce great images with fewer tokens. The post-training process considers not only world knowledge, scientific concepts, and math, but also what kind of taste resonates with users and how to make outputs beautiful and realistic.
### Kenji’s Personal Benchmark
Kenji often uses the “grid test”: generate a grid of 100 random objects—almost all correct. He also recalls asking early models (Ada, Babbage, Curie) to list 100 sci-fi books; some started repeating at book 22, helping measure capability limits.
### Adele’s Personal Evaluation
Adele has her own “me-me-me” evaluation: 100 photos of herself, friends, and family, placing each person in a funny pose—she makes cards or birthday images for nearly everyone. She finds this a great test because she knows faces best, and it also checks whether ChatGPT understands context: does it remember the user has siblings, parents, their preferences, and personalize the image accordingly?
---
**Source:** https://www.youtube.com/watch?v=bH2nP-aCFjk