MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware
Summary
MobileEgo Anywhere is a mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models by lowering hardware barriers.
View Cached Full Text
Cached at: 05/18/26, 02:26 PM
Paper page - MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware
Source: https://huggingface.co/papers/2605.05945
Abstract
A mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models.
The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scaleegocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture thelong horizon temporal dependenciesnecessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commoditymobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.
View arXiv pageView PDFProject pageGitHub5Add to collection
Get this paper in your agent:
hf papers read 2605\.05945
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.05945 in a model README.md to link it from this page.
Datasets citing this paper1
#### fpvlabs/stera-10m Updated3 days ago • 13.1k • 13
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.05945 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
MobileMoE: Scaling On-Device Mixture of Experts
MobileMoE introduces efficient on-device mixture-of-experts language models with sub-billion parameters, achieving better performance and efficiency than dense baselines and existing MoE models. The models are trained on open-source datasets and demonstrate significant speedups on commodity smartphones.
MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration
MobileExplorer is a new framework that accelerates on-device inference for mobile GUI agents by performing lightweight parallel exploration of UI elements during model inference, reducing reasoning steps and latency by 23% while maintaining or improving task success rates.
MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models
MIRAGE is a framework for mobile GUI agents that replaces verbose chain-of-thought reasoning with compact continuous latent representations, incorporating a generative world model perspective to predict future screen states before acting. On AndroidWorld and AndroidControl benchmarks, it achieves competitive or superior performance while reducing generated tokens by over 75%.
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild
AnyMo is a geometry-aware framework for setup-agnostic human motion modeling using physics-grounded IMU simulation and graph encoding, achieving significant improvements in zero-shot activity recognition, cross-modal retrieval, and motion captioning across multiple datasets.
EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera
EgoForce is a monocular 3D hand reconstruction framework that uses a unified network with differentiable forearm representation, arm-hand transformers, and ray space solvers to recover absolute hand pose and position across different camera models, achieving state-of-the-art accuracy on egocentric benchmarks.