Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Hugging Face Daily Papers 06/10/26, 12:00 AM Papers

spatial-reasoning egocentric-video novel-view-synthesis training-free mllm geometry-to-video cross-view

Summary

A training-free framework for spatial reasoning from egocentric videos that enables revisiting conclusions through synthesized novel-view videos generated from predicted 3D geometry.

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, an MLLM forms a spatial hypothesis from the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesized novel-view video. To enable effective cross-view revisiting, we design a Geometry-to-Video pipeline that renders strategically complementary novel views from predicted 3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving the MLLM's native video interface without architectural modifications. Extensive evaluations on VSI-Bench and STI-Bench demonstrate that ReRe substantially boosts open-source MLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:38 PM

Paper page - Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Source: https://huggingface.co/papers/2606.11683

Abstract

A training-free framework for spatial reasoning from egocentric videos that enables revisiting conclusions through synthesized novel-view videos generated from predicted 3D geometry.

Spatial reasoningfromegocentric videosis inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue thatspatial reasoningshould be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, anMLLMforms aspatial hypothesisfrom the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesizednovel-view video. To enable effectivecross-view revisiting, we design aGeometry-to-Video pipelinethat renders strategically complementary novel views from predicted3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving theMLLM’s native video interface without architectural modifications. Extensive evaluations onVSI-BenchandSTI-Benchdemonstrate that ReRe substantially boosts open-sourceMLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2606\.11683

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.11683 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.11683 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.11683 in a Space README.md to link it from this page.

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Paper page - Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

Submit Feedback

Similar Articles

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning