Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Hugging Face Daily Papers 05/15/26, 12:00 AM Papers

Summary

Flash-GRPO improves training efficiency for video diffusion models by addressing temporal variance and gradient inconsistency through iso-temporal grouping and temporal gradient rectification, achieving state-of-the-art alignment quality with substantial training acceleration.

Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computational bottleneck: training a 14B parametered model typically demands hundreds of GPU days per experiment. Existing efficiency methods reduce costs through sliding window subsampling training timesteps, but fundamentally compromise optimization, exhibiting severe instability and failing to reach full trajectory performance. We present Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance from timestep difficulty; temporal gradient rectification neutralizes the time-dependent scaling factor that causes vastly inconsistent gradient magnitudes across timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO's effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment quality.

Original Article

View Cached Full Text

Cached at: 05/18/26, 02:23 AM

Paper page - Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Source: https://huggingface.co/papers/2605.15980 Authors:

Abstract

Flash-GRPO improves training efficiency for video diffusion models by addressing temporal variance and gradient inconsistency through iso-temporal grouping and temporal gradient rectification.

Group Relative Policy Optimizationhas emerged as essential for aligningvideo diffusion modelswith human preferences, but faces a critical computational bottleneck: training a 14Bparametered modeltypically demands hundreds of GPU days per experiment. Existing efficiency methods reduce costs throughsliding window subsamplingtraining timesteps, but fundamentally compromise optimization, exhibiting severe instability and failing to reach full trajectory performance. We present Flash-GRPO, asingle-step training frameworkthat outperformsfull trajectory trainingin alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges:iso-temporal groupingeliminatestimestep-confounded varianceby enforcing prompt-wisetemporal consistency, decoupling policy performance from timestep difficulty;temporal gradient rectificationneutralizes the time-dependent scaling factor that causes vastly inconsistentgradient magnitudesacross timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO’s effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment quality.

View arXiv page View PDF Project page GitHub7 Add to collection

Get this paper in your agent:

hf papers read 2605\.15980

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.15980 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.15980 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.15980 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Paper page - Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

@probablynotaz9: Solo-author ICML paper alert Ever wanted to post-train your diffusion LLM with good old policy gradients, without havin…

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Submit Feedback

Similar Articles

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

@probablynotaz9: Solo-author ICML paper alert Ever wanted to post-train your diffusion LLM with good old policy gradients, without havin…

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking