KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Hugging Face Daily Papers 05/14/26, 12:00 AM Papers

video-alignment reinforcement-learning ode-native kv-cache flow-matching autoregressive-models grpo

Summary

KVPO introduces an ODE-native online GRPO framework that aligns streaming autoregressive video generators with human preferences using causal-semantic KV cache exploration and a velocity-field surrogate policy, achieving consistent improvements in visual quality and alignment.

Aligning streaming autoregressive (AR) video generators with human preferences is challenging. Existing reinforcement learning methods predominantly rely on noise-based exploration and SDE-based surrogate policies that are mismatched to the deterministic ODE dynamics of distilled AR models, and tend to perturb low-level appearance rather than the high-level semantic storyline progression critical for long-horizon coherence. To address these limitations, we present KVPO, an ODE-native online Group Relative Policy Optimization (GRPO) framework for aligning streaming video generators. For diversity exploration, KVPO introduces a causal-semantic exploration paradigm that relocates the source of variation from stochastic noise to the historical KV cache. By stochastically routing historical KV entries, it constructs semantically diverse generation branches that remain strictly on the data manifold. For policy modeling, KVPO introduces a velocity-field surrogate policy based on Trajectory Velocity Energy (TVE), which quantifies branch likelihood in flow-matching velocity space and yields a reward-weighted contrastive objective fully consistent with the native ODE formulation. Experiments on multiple distilled AR video generators demonstrate consistent gains in visual quality, motion quality, and text-video alignment across both single-prompt short-video and multi-prompt long-video settings.

Original Article

View Cached Full Text

Cached at: 05/19/26, 06:31 AM

Paper page - KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Source: https://huggingface.co/papers/2605.14278 Published on May 14

Submitted byhttps://huggingface.co/kkakkkka

kkakaon May 19

Abstract

ODENative online GRPO framework KVPO aligns streaming video generators with human preferences through causal-semantic exploration and velocity-field surrogate policy based on trajectory velocity energy.

Aligning streaming autoregressive (AR) video generators with human preferences is challenging. Existingreinforcement learningmethods predominantly rely onnoise-based explorationandSDE-based surrogate policiesthat are mismatched to the deterministicODE dynamicsofdistilled AR models, and tend to perturb low-level appearance rather than the high-level semantic storyline progression critical for long-horizon coherence. To address these limitations, we present KVPO, an ODE-native onlineGroup Relative Policy Optimization(GRPO) framework for aligning streaming video generators. For diversity exploration, KVPO introduces acausal-semantic explorationparadigm that relocates the source of variation from stochastic noise to the historicalKV cache. By stochastically routing historical KV entries, it constructs semantically diverse generation branches that remain strictly on the data manifold. For policy modeling, KVPO introduces a velocity-field surrogate policy based onTrajectory Velocity Energy(TVE), which quantifies branch likelihood inflow-matching velocity spaceand yields areward-weighted contrastive objectivefully consistent with the native ODE formulation. Experiments on multiple distilled AR video generators demonstrate consistent gains in visual quality, motion quality, and text-video alignment across both single-prompt short-video and multi-prompt long-video settings.

View arXiv page View PDF Project page GitHub6 Add to collection

Get this paper in your agent:

hf papers read 2605\.14278

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.14278 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.14278 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.14278 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Paper page - KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Speculative Decoding for Autoregressive Video Generation

Submit Feedback

Similar Articles

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Speculative Decoding for Autoregressive Video Generation