Tag
Proposes an information-theoretic framework for optimizing classifier-free guidance schedules in diffusion models, achieving improved trade-offs between condition consistency and sample diversity on ImageNet and COCO benchmarks.
Introduces DiffusionBench, a unified benchmark for holistic evaluation of generative diffusion transformers, supporting multiple generation tasks and providing standardized training and evaluation.
Krea 2 is a series of foundation models for creative image generation, built with a large-scale data infrastructure and multi-stage training pipeline. It introduces a prompt expander and style-reference system to improve steerability and enable creative exploration.
Krea 2 is a 12-billion parameter text-to-image diffusion model released open-weight on Hugging Face, with Raw (base) and Turbo (post-trained) checkpoints available.
Asks for recommendations on affordable AI models for content writing, image generation, and vibe coding.
Boogu has released a series of open-source unified image generation and editing models, including Base, Turbo, and Edit variants.
Researchers introduce NanoGen, a unified framework for training and evaluating diffusion transformers, and propose DiffusionBench, a holistic benchmark combining ImageNet class-conditional and text-to-image generation to better assess progress in generative modeling.
Semantic Browsing introduces a method for controlled diversity in text-to-image generation by using a Vision Language Model with an agentic workflow to generate structured, interpretable variations based on semantic decisions.
User presents a comprehensive comparison of local text-to-image models using 192 prompts, evaluating capabilities like text rendering, faces, anatomy, and spatial composition, with results and prompts publicly available at imagebench.ai.
The author details the process of pretraining and post-training a 500M parameter language model and a 330M parameter image generator entirely from scratch.
Thumbmagic is an AI thumbnail generator trained on top-performing thumbnails.
Discovered a creative usage: using an embedded browser to achieve infinite canvas image generation with Codex Image 2.
A user demonstrates giving a local LLM agent MCP tools for local image and video generation, enabling fully offline and free generation on demand.
Shows three years of AI progress: ModelScope on the left, Grok Imagine 1.5 on the right.
Midjourney, known for AI image generation, has developed a new technology that is described as the sequel to the MRI, likely advancing medical imaging capabilities.
FreeStyle proposes a scalable dual-reference generation framework using community LoRA mining to construct large-scale style-content triplets, with disentanglement mechanisms to prevent content leakage, and introduces a comprehensive benchmark for evaluation.
This paper analyzes the variance of FID scores across different training and sampling seeds, revealing significant reproducibility issues in image generation evaluation. It proposes a new evaluation protocol with error bars and per-cell optimal guidance tuning.
A LoRA that adapts Ideogram 4 to generate high-quality images in as few as 2 steps without CFG, using a novel continuous turbo training method.
Comfy-Org has repackaged Boogu-Image model files for ComfyUI, including base, edit, and turbo variants with different quantization formats, plus a LoRA and text encoder.
Google released a new image generation model.