PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

Hugging Face Daily Papers Papers

Summary

PoLAR introduces a geometrically structured latent action representation in hyperbolic space that separates transition extent from mode, improving robotic policy learning performance.

Latent action pretraining learns representations of visual change from pairs of observations, but existing methods typically encode each transition as a single unstructured representation that entangles transition extent and transition mode. We introduce Polar Latent Actions with Radial structure (PoLAR), which imposes a radial-direction structure on latent actions, encouraging radius to encode transition extent and direction to retain transition mode. PoLAR uses temporal offset between two observations as a weak proxy for transition extent, encouraging latent action from observation pairs separated by larger temporal gaps to occupy larger radii. We instantiate this structure in hyperbolic space, whose expanding volume with radius offers a natural fit for more diverse transition modes at larger extents. Across in-task and large-scale pretraining settings, PoLAR improves downstream policy performance in simulation and real-world robot experiments, outperforming latent action baselines and strong pretrained VLAs. These results suggest that the geometry of the latent action space is an important design choice for transferring visual pretraining to downstream robot policy learning.
Original Article
View Cached Full Text

Cached at: 06/23/26, 05:40 AM

Paper page - PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

Source: https://huggingface.co/papers/2606.21139

Abstract

PoLAR introduces a geometrically structured latent action representation in hyperbolic space that separates transition extent from transition mode, improving robotic policy learning performance.

Latent action pretraininglearns representations of visual change from pairs of observations, but existing methods typically encode each transition as a single unstructured representation that entanglestransition extentandtransition mode. We introducePolar Latent ActionswithRadial structure(PoLAR), which imposes a radial-direction structure on latent actions, encouraging radius to encodetransition extentand direction to retaintransition mode. PoLAR usestemporal offsetbetween two observations as a weak proxy fortransition extent, encouraging latent action from observation pairs separated by larger temporal gaps to occupy larger radii. We instantiate this structure inhyperbolic space, whose expanding volume with radius offers a natural fit for more diversetransition modes at larger extents. Across in-task and large-scale pretraining settings, PoLAR improvesdownstream policy performancein simulation and real-world robot experiments, outperforming latent action baselines and strong pretrained VLAs. These results suggest that the geometry of the latent action space is an important design choice for transferringvisual pretrainingto downstream robot policy learning.

View arXiv pageView PDFProject pageGitHub0Add to collection

Get this paper in your agent:

hf papers read 2606\.21139

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.21139 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.21139 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.21139 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Geometric Action Model for Robot Policy Learning

Hugging Face Daily Papers

The Geometric Action Model (GAM) repurposes a pretrained geometric foundation model (GFM) as a unified backbone for language-conditioned robot manipulation, achieving higher accuracy, robustness, and efficiency than existing foundation-model-scale baselines across simulation and real-world benchmarks.

Revisiting Action Factorization for Complex Action Spaces

arXiv cs.LG

This paper presents a cross-sectional study comparing various action factorization methods (independent networks, shared encoder, VDN, QPLEX, Joint, Auto-Regressive) across three RL algorithm families (PPO, SAC, DQN) in hybrid discrete-continuous action spaces, introducing two new lightweight environments and variants VDN-PPO and PPO-MIX.