GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Hugging Face Daily Papers Papers

Summary

GenEvolve is a self-evolving image generation framework that uses tool-orchestrated trajectories and visual experience distillation to iteratively improve generative capabilities, achieving state-of-the-art performance.

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by on-policy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/
Original Article
View Cached Full Text

Cached at: 05/22/26, 10:19 AM

Paper page - GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Source: https://huggingface.co/papers/2605.21605

Abstract

A self-evolving image generation framework uses tool-orchestrated trajectories and visual experience distillation to improve generative capabilities through iterative learning and reference-based prompting.

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model’s internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a generalimage-generation agentthat can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, aself-evolving frameworkbased onTool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as atool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired byon-policy self-distillation,Visual Experience Distillationprovides dense token-level supervision, helping the student internalize better search, knowledge activation,reference selection, andprompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/

View arXiv pageView PDFProject pageGitHub5Add to collection

Get this paper in your agent:

hf papers read 2605\.21605

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### MeiGen-AI/GenEvolve Image-Text-to-Text• 9B• Updatedabout 9 hours ago • 22 • 4

Datasets citing this paper1

#### MeiGen-AI/GenEvolve-Data-Bench Viewer• Updatedabout 9 hours ago • 12.8k • 116 • 1

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.21605 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

EvoMap/evolver

GitHub Trending (daily)

Evolver is a GEP-powered self-evolution engine for AI agents that automates prompt optimization and creates auditable, reusable evolution assets. The project is transitioning from fully open source to source-available while maintaining backward compatibility with existing MIT and GPL-3.0 releases.

GenClaw: Code-Driven Agentic Image Generation

Hugging Face Daily Papers

GenClaw introduces a code-driven agentic image generation framework that breaks the black-box paradigm by mimicking the human creative process: conceptualizing, sketching with code (SVG/HTML/Three.js), and then using generative models for texture and photorealism.