Tag
VideoMDM trains 3D human motion priors from 2D poses using a diffusion framework with 2D reprojection loss and 3D motion regularizers, achieving near-3D supervised performance without requiring 3D ground truth.
GRAIL generates diverse humanoid manipulation and locomotion data using 3D assets and video foundation models, enabling effective sim-to-real transfer for humanoid robot control with high real-world success rates.