MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware

Hugging Face Daily Papers Papers

Summary

MobileEgo Anywhere is a mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models by lowering hardware barriers.

The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scale egocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture the long horizon temporal dependencies necessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commodity mobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.
Original Article
View Cached Full Text

Cached at: 05/18/26, 02:26 PM

Paper page - MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware

Source: https://huggingface.co/papers/2605.05945

Abstract

A mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models.

The recent advancement of Vision Language Action (VLA) models has driven a critical demand for large scaleegocentric datasets. However, existing datasets are often limited by short episode durations, typically spanning only a few minutes, which fails to capture thelong horizon temporal dependenciesnecessary for complex robotic task execution. To bridge this gap, we present MobileEgo Anywhere, a framework designed to facilitate the collection of robust, hour plus egocentric trajectories using commoditymobile hardware. We leverage the ubiquitous sensor suites of modern smartphones to provide high fidelity, long term camera pose tracking, effectively removing the high hardware barriers associated with traditional robotics data collection. Our contributions are three fold: (1) we release a novel dataset comprising 200 hours of diverse, long form egocentric data with persistent state tracking; (2) we open source a mobile application that enables any user to record egocentric data, and (3) we provide a comprehensive processing pipeline to convert raw mobile captures into standardized, training ready formats for Vision Language Action model and foundation model research. By democratizing the data collection process, this work enables the massive scale acquisition of long horizon data across varied global environments, accelerating the development of generalizable robotic policies.

View arXiv pageView PDFProject pageGitHub5Add to collection

Get this paper in your agent:

hf papers read 2605\.05945

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.05945 in a model README.md to link it from this page.

Datasets citing this paper1

#### fpvlabs/stera-10m Updated3 days ago • 13.1k • 13

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.05945 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

MobileMoE: Scaling On-Device Mixture of Experts

Hugging Face Daily Papers

MobileMoE introduces efficient on-device mixture-of-experts language models with sub-billion parameters, achieving better performance and efficiency than dense baselines and existing MoE models. The models are trained on open-source datasets and demonstrate significant speedups on commodity smartphones.

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

arXiv cs.AI

MIRAGE is a framework for mobile GUI agents that replaces verbose chain-of-thought reasoning with compact continuous latent representations, incorporating a generative world model perspective to predict future screen states before acting. On AndroidWorld and AndroidControl benchmarks, it achieves competitive or superior performance while reducing generated tokens by over 75%.

AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Hugging Face Daily Papers

AnyMo is a geometry-aware framework for setup-agnostic human motion modeling using physics-grounded IMU simulation and graph encoding, achieving significant improvements in zero-shot activity recognition, cross-modal retrieval, and motion captioning across multiple datasets.