Optimizing Visual Generative Models via Distribution-wise Rewards

Hugging Face Daily Papers 07/02/26, 12:00 AM Papers

Summary

This paper presents a reinforcement learning framework for visual generative models that uses distribution-wise rewards, with a subset-replace strategy for efficiency, improving image diversity and quality while addressing mode collapse and reward hacking.

Conventional reinforcement learning strategies for visual generation typically employ sample-wise reward functions, yet this practice frequently results in reward hacking that degrades image diversity and introduces visual anomalies. To address these limitations, we present a novel framework that finetunes generative models using distribution-wise rewards, ensuring better alignment with real-world data distributions. Unlike rewards that evaluate samples individually, distribution-wise reward accounts for the data distribution of the samples, mitigating the mode collapse problem that occurs when all samples optimize towards the same direction independently. To overcome the prohibitive computational cost of estimating these rewards, we introduce a subset-replace strategy that efficiently provides reward signals by updating only a small subset of a generated reference set. Additionally, we apply RL to optimize post-hoc model merging coefficients, potentially mitigating the train-inference inconsistency caused by introducing stochastic differential equation (SDE) in regular RL practices. Extensive experiments show our approach significantly improves FID-50K across various base models, from 8.30 to 5.77 for SiT and from 3.74 to 3.52 for EDM2. Qualitative evaluation also confirms that our method enhances perceptual quality while preserving sample diversity.

Original Article

View Cached Full Text

Cached at: 07/03/26, 03:52 AM

Paper page - Optimizing Visual Generative Models via Distribution-wise Rewards

Source: https://huggingface.co/papers/2607.02291

Abstract

A novel reinforcement learning framework for visual generation uses distribution-wise rewards to improve image diversity and quality while addressing mode collapse and computational efficiency issues.

Conventionalreinforcement learningstrategies for visual generation typically employsample-wise reward functions, yet this practice frequently results inreward hackingthat degrades image diversity and introduces visual anomalies. To address these limitations, we present a novel framework that finetunesgenerative modelsusingdistribution-wise rewards, ensuring better alignment with real-world data distributions. Unlike rewards that evaluate samples individually, distribution-wise reward accounts for the data distribution of the samples, mitigating themode collapseproblem that occurs when all samples optimize towards the same direction independently. To overcome the prohibitive computational cost of estimating these rewards, we introduce asubset-replace strategythat efficiently provides reward signals by updating only a small subset of a generated reference set. Additionally, we apply RL to optimizepost-hoc model mergingcoefficients, potentially mitigating the train-inference inconsistency caused by introducingstochastic differential equation(SDE) in regular RL practices. Extensive experiments show our approach significantly improvesFID-50Kacross various base models, from 8.30 to 5.77 for SiT and from 3.74 to 3.52 for EDM2. Qualitative evaluation also confirms that our method enhancesperceptual qualitywhile preserving sample diversity.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2607\.02291

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2607.02291 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2607.02291 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2607.02291 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Optimizing Visual Generative Models via Distribution-wise Rewards

Paper page - Optimizing Visual Generative Models via Distribution-wise Rewards

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Reward as An Agent for Embodied World Models

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

Hierarchical Variational Policies for Reward-Guided Diffusion

Submit Feedback

Similar Articles

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Reward as An Agent for Embodied World Models

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

Hierarchical Variational Policies for Reward-Guided Diffusion