Tag
HiDream-ai has open-sourced HiDream-O1-Image (8B), a unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) that natively handles text-to-image, image editing, and subject-driven personalization at up to 2048×2048 resolution without external VAEs or disjoint text encoders. It debuted at #8 in the Artificial Analysis Text to Image Arena and is positioned as a leading open-weights text-to-image model.
This paper introduces Continuous-Time Distribution Matching (CDM), a method for few-step diffusion distillation that migrates from discrete to continuous optimization to improve visual fidelity and preserve fine details.
This paper introduces D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy self-distillation during supervised fine-tuning. It allows models to learn new concepts or styles without compromising their efficient few-step inference capabilities.
The paper introduces JoyAI-Image, a unified multimodal foundation model that integrates a spatially enhanced MLLM with MMDiT to achieve state-of-the-art performance in visual understanding, text-to-image generation, and instruction-guided editing.
This paper introduces FD-loss, a method to optimize Fréchet Distance as a training objective for visual generation by decoupling population and batch sizes. It demonstrates that this approach improves generator quality and suggests FID may not always accurately reflect visual quality.
Tuna-2 is a unified multimodal model that achieves state-of-the-art performance by processing visual understanding and generation directly from pixel embeddings, eliminating the need for pretrained vision encoders.
Within 24 hours of OpenAI’s launch of ChatGPT Images 2.0, users have unleashed a flood of creative, viral image demos.
GPT Image 2 impresses users with its ability to blend abstract concepts from GTA 6 and Cyberpunk 2077 into a cohesive screenshot.
User reports Gemini 3.1 Pro unexpectedly streaming an image as line-by-line Base64 instead of returning a normal file.
GPT-Image-2 can generate highly detailed interiors of WW2 submarines rendered in the distinctive low-poly GoldSrc style of Half-Life 1.
Article discusses evaluating GPT-Image-2's capabilities through a 'President Test' scenario.
Researchers introduce GSI-Bench, the first benchmark to quantify generative spatial intelligence in multimodal models by evaluating 3D spatial constraint compliance during image generation. Fine-tuning on their synthetic dataset boosts both spatial editing fidelity and downstream spatial understanding, showing generative training can strengthen spatial reasoning.
GPT-Image-2 shows a major leap in image generation quality, enabling Agent-S to auto-create polished slide decks and apps.
A developer created an immersive "time machine" tool using OpenAI’s new image model that generates explorable panoramic scenes from text prompts.
ChatGPT Images 2.0 now supports configurable aspect ratios and resolution, as demonstrated by user @dibyayB.
OpenAI researchers explain the advances that make ChatGPT Images 2.0 a state-of-the-art image generation model, highlighting its thinking and intelligence capabilities.
OpenAI released ChatGPT Images 2.0, claiming a GPT-3-to-GPT-5 leap; Simon Willison benchmarks it with a "Where's Waldo"-style raccoon-and-ham-radio prompt against gpt-image-1, Google Nano Banana 2 and Pro, showing mixed hide-and-seek success.
OpenAI released an upgraded image model that keeps character appearance perfectly consistent across frames and renders crisp, stable text.
YouTube talk by @sedielem offering a concise state-of-the-art overview of scaling generative image and video models, covering modeling, architecture, distillation and control.
Users are discovering strong meme-generation capabilities in GPT Image 2, particularly for game-specific humor.