EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Hugging Face Daily Papers 05/12/26, 12:00 AM Papers

hand-pose-estimation egocentric-vision 3d-reconstruction ar-vr monocular-rgb transformer

Summary

EgoForce is a monocular 3D hand reconstruction framework that uses a unified network with differentiable forearm representation, arm-hand transformers, and ray space solvers to recover absolute hand pose and position across different camera models, achieving state-of-the-art accuracy on egocentric benchmarks.

Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained by depth-scale ambiguity and struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to acquire. This paper addresses these challenges by introducing EgoForce, a monocular 3D hand reconstruction framework that recovers robust, absolute 3D hand pose and its position from the user's (camera-space) viewpoint. EgoForce operates across fisheye, perspective, and distorted wide-FOV camera models using a single unified network. Our approach combines a differentiable forearm representation that stabilizes hand pose, a unified arm-hand transformer that predicts both hand and forearm geometry from a single egocentric view, mitigating depth-scale ambiguity, and a ray space closed-form solver that enables absolute 3D pose recovery across diverse head-mounted camera models. Experiments on three egocentric benchmarks show that EgoForce achieves state-of-the-art 3D accuracy, reducing camera-space MPJPE by up to 28% on the HOT3D dataset compared to prior methods and maintaining consistent performance across camera configurations. For more details, visit the project page at https://dfki-av.github.io/EgoForce.

Original Article

View Cached Full Text

Cached at: 05/13/26, 08:14 PM

Paper page - EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Source: https://huggingface.co/papers/2605.12498

Abstract

EgoForce is a monocular 3D hand reconstruction framework that uses a unified network to recover robust, absolute hand pose and position across different camera models through differentiable forearm representation, arm-hand transformers, and ray space solvers.

Reconstructing the absolute 3D pose and shape of the hands from the user’s viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained bydepth-scale ambiguityand struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to acquire. This paper addresses these challenges by introducing EgoForce, amonocular 3D hand reconstructionframework that recovers robust, absolute 3D hand pose and its position from the user’s (camera-space) viewpoint. EgoForce operates across fisheye, perspective, anddistorted wide-FOV camera models using a single unified network. Our approach combines adifferentiable forearm representationthat stabilizes hand pose, a unifiedarm-hand transformerthat predicts both hand and forearm geometry from a single egocentric view, mitigatingdepth-scale ambiguity, and aray space closed-form solverthat enables absolute 3D pose recovery across diverse head-mounted camera models. Experiments on three egocentric benchmarks show that EgoForce achieves state-of-the-art 3D accuracy, reducing camera-space MPJPE by up to 28% on the HOT3D dataset compared to prior methods and maintaining consistent performance across camera configurations. For more details, visit the project page at https://dfki-av.github.io/EgoForce.

View arXiv page View PDF Project page GitHub Add to collection

Get this paper in your agent:

hf papers read 2605\.12498

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.12498 in a model README.md to link it from this page.

Datasets citing this paper1

#### chris10/EgoForce Updated23 minutes ago • 11.4k

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Paper page - EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper1

Collections including this paper0

Similar Articles

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

Human Universal Grasping

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

ActiveMimic: Egocentric Video Pretraining with Active Perception

Submit Feedback

Similar Articles

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

ActiveMimic: Egocentric Video Pretraining with Active Perception