PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Hugging Face Daily Papers 05/14/26, 12:00 AM Papers

human-motion video-generation physics-simulator reinforcement-learning 3d-motion reward-system motion-realism

Summary

PhyMotion proposes a physics-grounded reward system that evaluates kinematic plausibility, contact consistency, and dynamic feasibility of human motion in generated videos, achieving stronger correlation with human judgment and improving motion realism in RL-based post-training.

Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in a physics simulator and evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recover SMPL body meshes from generated videos, retarget them onto a humanoid in the MuJoCo physics simulator, and evaluate the resulting motion along three axes: kinematic plausibility, contact and balance consistency, and dynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overall video generation quality with only modest training overhead.

Original Article

View Cached Full Text

Cached at: 05/15/26, 04:24 AM

Paper page - PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Source: https://huggingface.co/papers/2605.14269

Abstract

PhyMotion introduces a physics-grounded reward system for human motion generation that evaluates kinematic plausibility, contact consistency, and dynamic feasibility to improve video quality.

Generating realistichuman motionis a central yet unsolved challenge invideo generation. Whilereinforcement learning(RL)-based post-training has driven recent gains in general video quality, extending it tohuman motionremains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulatedhuman motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in aphysics simulatorand evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recoverSMPL body meshesfrom generated videos, retarget them onto a humanoid in theMuJoCo physics simulator, and evaluate the resulting motion along three axes:kinematic plausibility, contact and balance consistency, anddynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive andbidirectional video generatorsunder both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overallvideo generationquality with only modest training overhead.

View arXiv page View PDF Project page GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2605\.14269

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.14269 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.14269 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.14269 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Paper page - PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Video Models Can Reason with Verifiable Rewards

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Geo-Align: Video Generation Alignment via Metric Geometry Reward

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

Submit Feedback

Similar Articles

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Video Models Can Reason with Verifiable Rewards

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Geo-Align: Video Generation Alignment via Metric Geometry Reward

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis