Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Hugging Face Daily Papers 05/25/26, 12:00 AM Papers

Summary

RTDMD is a two-stage framework combining distribution matching distillation with reward-guided reinforcement learning to improve few-step image generation alignment with human preferences. It achieves state-of-the-art results on multiple models with only 4 inference steps.

Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators. We show that minimizing the KL divergence to a reward-tilted teacher distribution naturally decomposes into a distribution matching term and a reward maximization term. In the first stage, we introduce Ambient-Consistent Distribution Matching Distillation (AC-DMD), which performs subinterval-wise distribution matching and augments the fake score objective with a consistency regularizer to help the fake score model track the shifting generator distribution under limited updates. In the second stage, we jointly optimize both terms: for the reward maximization term, we derive a hybrid policy gradient that combines a GRPO-style estimator for the stochastic intermediate transitions with direct reward backpropagation through the deterministic final step, and further introduce step-subset GRPO (SubGRPO) to reduce variance. Experiments on SD3, SD3.5, and FLUX.2 demonstrate that RTDMD establishes new state-of-the-art results across preference, aesthetic, and compositional metrics with only 4 inference steps, outperforming previous few-step text-to-image generation methods. Code and models are available at https://github.com/Harahan/RTDMD.

Original Article

View Cached Full Text

Cached at: 05/26/26, 06:42 AM

Paper page - Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Source: https://huggingface.co/papers/2605.26108

Abstract

RTDMD is a two-stage framework that combines distribution matching distillation with reward-guided reinforcement learning to improve few-step image generation alignment with human preferences.

Recent advances in few-stepdiffusion distillationhave enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-TiltedDistribution Matching Distillation(RTDMD), a two-stage framework that unifiesdistribution matching distillationwithreward-guided reinforcement learningfor few-step flow generators. We show that minimizing theKL divergenceto areward-tilted teacher distributionnaturally decomposes into a distribution matching term and a reward maximization term. In the first stage, we introduce Ambient-ConsistentDistribution Matching Distillation(AC-DMD), which performs subinterval-wise distribution matching and augments thefake score objectivewith aconsistency regularizerto help the fake score model track the shifting generator distribution under limited updates. In the second stage, we jointly optimize both terms: for the reward maximization term, we derive a hybridpolicy gradientthat combines aGRPO-style estimator for the stochastic intermediate transitions with direct reward backpropagation through the deterministic final step, and further introduce step-subsetGRPO(SubGRPO) to reduce variance. Experiments on SD3, SD3.5, and FLUX.2 demonstrate that RTDMD establishes new state-of-the-art results across preference, aesthetic, and compositional metrics with only 4 inference steps, outperforming previous few-step text-to-image generation methods. Code and models are available at https://github.com/Harahan/RTDMD.

View arXiv page View PDF GitHub3 Add to collection

Get this paper in your agent:

hf papers read 2605\.26108

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper2

#### Harahan/FLUX2-4B-RTDMD Text-to-Image• Updated39 minutes ago #### Harahan/SD35M-RTDMD Text-to-Image• Updated39 minutes ago

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.26108 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.26108 in a Space README.md to link it from this page.

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Paper page - Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Abstract

Models citing this paper2

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Reinforcement learning with prediction-based rewards

Submit Feedback

Similar Articles

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Reinforcement learning with prediction-based rewards