EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

Hugging Face Daily Papers Papers

Summary

EgoPhys introduces a framework to construct deformable physical digital twins from egocentric RGB video using generalizable priors and a compact codebook, enabling zero-shot generalization to unseen objects without per-spring optimization. The system is demonstrated on a real robot, showing that egocentric human play video can serve as internal world representation for deformable-object planning.

Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric videos by distilling per-object inverse-physics solutions into a compact codebook, enabling prediction of dense spring stiffness fields for unseen objects without per-spring test-time optimization. Trained with generalizable priors from diverse egocentric interactions, EgoPhys outperforms baselines in reconstruction, future prediction, and zero-shot generalization. To support training and evaluation, we curate an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. We deploy EgoPhys on a real xArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning, highlighting egocentric RGB observations as a scalable path toward real-to-sim pipelines.
Original Article
View Cached Full Text

Cached at: 06/16/26, 03:32 PM

Paper page - EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

Source: https://huggingface.co/papers/2606.16202

Abstract

EgoPhys enables deformable digital twin generation from egocentric RGB video by using generalizable priors and compact codebooks to predict dense spring stiffness fields without per-spring optimization.

Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video usinggeneralizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric videos by distilling per-objectinverse-physics solutionsinto a compactcodebook, enabling prediction ofdense spring stiffness fieldsfor unseen objects without per-spring test-time optimization. Trained withgeneralizable priorsfrom diverse egocentric interactions, EgoPhys outperforms baselines in reconstruction, future prediction, andzero-shot generalization. To support training and evaluation, we curate an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. We deploy EgoPhys on a realxArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning, highlighting egocentric RGB observations as a scalable path towardreal-to-sim pipelines.

View arXiv pageView PDFProject pageAdd to collection

Get this paper in your agent:

hf papers read 2606\.16202

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.16202 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.16202 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.16202 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

PhysBrain 1.0 Technical Report

Hugging Face Daily Papers

PhysBrain 1.0 is a technical report presenting a method that uses human egocentric video to generate physical commonsense supervision for vision-language-action models, achieving state-of-the-art results on embodied control benchmarks including ERQA, PhysBench, SimplerEnv-WidowX, LIBERO, and RoboCasa.

ActiveMimic: Egocentric Video Pretraining with Active Perception

Hugging Face Daily Papers

ActiveMimic is a pretraining framework that recovers camera and wrist trajectories from egocentric human video to model active perception as a viewpoint action, enabling robot pretraining that matches the performance of models trained directly on robot data.

Human Universal Grasping

Hugging Face Daily Papers

A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. The model, trained on a large egocentric dataset, significantly outperforms state-of-the-art baselines on a new benchmark.