AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Hugging Face Daily Papers 06/08/26, 12:00 AM Papers

Summary

AlloSpatial is an agentic framework that enhances spatial reasoning in foundation models by converting egocentric observations into structured allocentric representations, using cognitive mapping and tool-use reasoning. It improves performance by 5-18% on benchmarks and outperforms larger models through cold-start reinforcement learning.

Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocentric observations into a global allocentric spatial representation. To address this, we propose AlloSpatial, an agentic framework for allocentric spatial cognition in foundation models. AlloSpatial introduces World2Mind, a plug-and-play cognitive mapping sandbox that converts egocentric observations into structured allocentric priors, including Allocentric-Spatial Trees and route maps that support querying object topology, geometric relations, passability, and trajectories. To utilize these priors reliably under noisy reconstruction and ambiguous visual evidence, AlloSpatial introduces a Spatial Reasoning Harness for tool-use judgment, modality-decoupled cue collection, and geometry-semantic arbitration. We further internalize this process in Qwen3-VL through cold-start reinforcement learning with a harness-gated trajectory-level reward. Experiments on VSI-Bench and MindCube show that AlloSpatial improves proprietary models by 5%-18% in a training-free setting, while ASTs alone support strong spatial reasoning even when visual inputs are removed. The trained AlloSpatial agents further outperform larger general-purpose models and competitive spatial baselines, suggesting that structured allocentric representations, active tool use, and verifiable reasoning offer a promising route toward spatially capable foundation models.

Original Article

View Cached Full Text

Cached at: 06/15/26, 12:58 PM

Paper page - AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Source: https://huggingface.co/papers/2606.08952 Published on Jun 8

Submitted byhttps://huggingface.co/RSW233

RSWon Jun 15

Abstract

AlloSpatial framework enhances spatial reasoning in foundation models by converting egocentric observations into structured allocentric representations and enabling reliable spatial cognition through cognitive mapping and tool-use reasoning.

Multimodal Foundation Models(MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform localegocentric observationsinto a globalallocentric spatial representation. To address this, we propose AlloSpatial, an agentic framework for allocentric spatial cognition in foundation models. AlloSpatial introduces World2Mind, a plug-and-playcognitive mappingsandbox that convertsegocentric observationsinto structured allocentric priors, includingAllocentric-Spatial Treesand route maps that support querying object topology, geometric relations, passability, and trajectories. To utilize these priors reliably under noisy reconstruction and ambiguous visual evidence, AlloSpatial introduces aSpatial Reasoning Harnessfortool-use judgment,modality-decoupled cue collection, andgeometry-semantic arbitration. We further internalize this process in Qwen3-VL throughcold-start reinforcement learningwith a harness-gatedtrajectory-level reward. Experiments on VSI-Bench and MindCube show that AlloSpatial improves proprietary models by 5%-18% in a training-free setting, while ASTs alone support strong spatial reasoning even when visual inputs are removed. The trained AlloSpatial agents further outperform larger general-purpose models and competitive spatial baselines, suggesting that structured allocentric representations, active tool use, and verifiable reasoning offer a promising route toward spatially capable foundation models.

View arXiv page View PDF Project page GitHub9 Add to collection

Get this paper in your agent:

hf papers read 2606\.08952

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.08952 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.08952 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.08952 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Paper page - AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

Submit Feedback

Similar Articles

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI