Tag
AFUN proposes an affordance foundation model that predicts functional masks and 3D motion curves from RGB-D observations and language descriptions, enabling generalizable robot manipulation across diverse environments. The model outperforms baselines on multiple benchmarks and can be deployed for real-world tasks without fine-tuning.
PhyMotion proposes a physics-grounded reward system that evaluates kinematic plausibility, contact consistency, and dynamic feasibility of human motion in generated videos, achieving stronger correlation with human judgment and improving motion realism in RL-based post-training.