ActiveMimic: Egocentric Video Pretraining with Active Perception

Hugging Face Daily Papers 06/04/26, 12:00 AM Papers

Summary

ActiveMimic is a pretraining framework that recovers camera and wrist trajectories from egocentric human video to model active perception as a viewpoint action, enabling robot pretraining that matches the performance of models trained directly on robot data.

Egocentric human video offers a scalable alternative to robot data for pretraining, yet models pretrained on such video consistently underperform those pretrained on robot data. We attribute this gap to a missing signal, the active perception behavior in egocentric videos, where humans continuously reposition their viewpoint during manipulation, inducing camera motion that standard pipelines treat as noise. To address this, we present ActiveMimic, a pretraining framework that recovers synchronized camera and wrist trajectories from a single body-worn RGB camera, models camera motion as a viewpoint action, and jointly learns active perception and manipulation from in-the-wild egocentric human video before adapting to a target robot. Empirically, real-world experiments across tasks with diverse active perception demands show that ActiveMimic consistently surpasses baselines pretrained on human video and matches state-of-the-art models pretrained on robot data. Further analysis provides evidence that active perception capability originates from egocentric human video pretraining rather than robot-specific fine-tuning, confirming active perception as the key to unlocking egocentric human video for robot pretraining.

Original Article

View Cached Full Text

Cached at: 06/15/26, 12:58 PM

Paper page - ActiveMimic: Egocentric Video Pretraining with Active Perception

Source: https://huggingface.co/papers/2606.06194 Published on Jun 4

Submitted byhttps://huggingface.co/leolin9248

Leoon Jun 15

Abstract

ActiveMimic pretraining framework recovers camera and wrist trajectories from egocentric video to enable active perception learning that matches robot data performance.

Egocentric human video offers a scalable alternative to robot data for pretraining, yet models pretrained on such video consistently underperform those pretrained on robot data. We attribute this gap to a missing signal, theactive perceptionbehavior inegocentric videos, where humans continuously reposition their viewpoint during manipulation, inducingcamera motionthat standard pipelines treat as noise. To address this, we present ActiveMimic, a pretraining framework that recovers synchronized camera andwrist trajectoriesfrom a single body-worn RGB camera, modelscamera motionas aviewpoint action, and jointly learnsactive perceptionand manipulation from in-the-wild egocentric human video before adapting to a target robot. Empirically, real-world experiments across tasks with diverseactive perceptiondemands show that ActiveMimic consistently surpasses baselines pretrained on human video and matches state-of-the-art models pretrained on robot data. Further analysis provides evidence thatactive perceptioncapability originates from egocentric human video pretraining rather than robot-specific fine-tuning, confirmingactive perceptionas the key to unlocking egocentric human video forrobot pretraining.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2606\.06194

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.06194 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.06194 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.06194 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

ActiveMimic: Egocentric Video Pretraining with Active Perception

Paper page - ActiveMimic: Egocentric Video Pretraining with Active Perception

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Submit Feedback

Similar Articles

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera