spatial-reasoning

#spatial-reasoning

General Intuition’s $2.3B bet that video games can train AI agents for the real world

TechCrunch AI ↗ · 2d ago Cached

General Intuition raised $320M at a $2.3B valuation to develop AI agents trained on video game action labels, demonstrating a single model that can play games and control real-world robots with minimal fine-tuning.

0 favorites 0 likes

#spatial-reasoning

@dair_ai: https://x.com/dair_ai/status/2068724104815890889

X AI KOLs Following ↗ · 6d ago Cached

Highlights three recent AI papers: SpatialClaw (training-free spatial reasoning via code), SkillWeaver (compositional skill routing with decompose-retrieve-compose pipeline), and PreAct (compiling agent runs into fast state machines for repeated tasks).

0 favorites 0 likes

#spatial-reasoning

A chessboard is a surprisingly good way to catch what VLMs still get wrong

Reddit r/artificial ↗ · 2026-06-18

An informal experiment using a chessboard reveals that vision language models often fail at spatial reasoning and precise structured output, despite correctly recognizing pieces, highlighting a key gap in VLM evaluation.

0 favorites 0 likes

#spatial-reasoning

General Intuition in talks to raise $300M at around $2B valuation

TechCrunch AI ↗ · 2026-06-18 Cached

General Intuition, a startup building a foundation model for training AI agents in spatial-temporal reasoning using video game data, is in talks to raise $300 million at a $2 billion valuation, with backing from Jeff Bezos and Eric Schmidt.

0 favorites 0 likes

#spatial-reasoning

@Phoenixyin13: NVIDIA's SpatialClaw is fresh out. This framework directly lets VLM write code step by step in a persistent Python environment, like Jupyter. From calling SAM3 to see things, compute depth, use NumPy and SciPy to process data, view results in real time, if it doesn't work…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

NVIDIA has launched SpatialClaw, a code-based training-free agent framework for complex visual-spatial reasoning tasks, achieving an average of 59.9% on 20 benchmarks, 11.2 points higher than the previous best model.

0 favorites 0 likes

#spatial-reasoning

Thinking with Visual Grounding

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

This paper introduces visually grounded thinking, a method for vision-language models to interleave natural-language reasoning with explicit visual evidence grounding using points or boxes. A scalable synthesis pipeline and grounding-aware reinforcement learning improve reasoning accuracy, enabling a 4B model to match or surpass a 27B model on spatial and counting benchmarks.

0 favorites 0 likes

#spatial-reasoning

@HuggingPapers: SpatialClaw NVIDIA drops a training-free spatial reasoning agent that uses code as its action interface. A VLM writes P…

X AI KOLs Following ↗ · 2026-06-12 Cached

NVIDIA introduces SpatialClaw, a training-free spatial reasoning agent that uses a VLM to write Python code in a persistent kernel, compose perception tools, and revise plans, achieving +11.2 points over prior agents on 20 benchmarks.

0 favorites 0 likes

#spatial-reasoning

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

arXiv cs.AI ↗ · 2026-06-11 Cached

This paper proposes a self-supervised reinforcement learning framework that uses consistency verifiers—reward functions checking geometric and semantic consistency under transformations—to improve spatial reasoning in large reasoning models without requiring ground-truth annotations. The method approaches the accuracy of supervised fine-tuning and generalizes across diverse tasks.

0 favorites 0 likes

#spatial-reasoning

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

arXiv cs.AI ↗ · 2026-06-11 Cached

The paper proposes SVoT, a reinforcement learning framework that generates interleaved, verifiable intermediate states and visualizations for multi-hop spatial reasoning in MLLMs, achieving significant accuracy gains on new benchmarks involving multi-object interactions and numerical reasoning.

0 favorites 0 likes

#spatial-reasoning

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Hugging Face Daily Papers ↗ · 2026-06-11 Cached

SpatialClaw is a training-free framework that uses code as an action interface to enable flexible, stateful spatial reasoning in vision-language models, achieving superior performance across diverse 3D/4D spatial reasoning tasks.

0 favorites 0 likes

#spatial-reasoning

Architect-Ant: Editable Automatic Furnishing of Architectural Floor Plans

arXiv cs.AI ↗ · 2026-06-10 Cached

This paper presents Architect-Ant, an editable automatic furnishing framework for architectural floor plans, together with a curated dataset (AntPlan-270) of 270 floor plans with furniture annotations. The method uses a fine-tuned vision-language model and a domain-specific language to generate geometrically valid and functionally plausible furniture layouts that can be rasterized into blueprint-style images.

0 favorites 0 likes

#spatial-reasoning

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Hugging Face Daily Papers ↗ · 2026-06-10 Cached

A training-free framework for spatial reasoning from egocentric videos that enables revisiting conclusions through synthesized novel-view videos generated from predicted 3D geometry.

0 favorites 0 likes

#spatial-reasoning

Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning

arXiv cs.LG ↗ · 2026-06-09 Cached

This paper identifies a failure mode called PhysHack in LLM-based LEGO assembly generation and proposes PVPO, a sample-efficient reinforcement learning method with model-based data selection that improves physical and semantic alignment using only a small fraction of training data.

0 favorites 0 likes

#spatial-reasoning

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

AlloSpatial is an agentic framework that enhances spatial reasoning in foundation models by converting egocentric observations into structured allocentric representations, using cognitive mapping and tool-use reasoning. It improves performance by 5-18% on benchmarks and outperforms larger models through cold-start reinforcement learning.

0 favorites 0 likes

#spatial-reasoning

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

SpatialWorld is a unified benchmark for evaluating interactive spatial reasoning in multimodal agents across diverse real-world tasks, revealing that even the strongest models achieve low task success rates.

0 favorites 0 likes

#spatial-reasoning

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

The paper proposes Astra, an agentic spatial reasoning framework that couples a reinforcement learning-trained VLM policy with a world simulator to generate novel-view observations for improved spatial reasoning in Vision-Language Models.

0 favorites 0 likes

#spatial-reasoning

Can LLMs Adhere to Strict 2D Spatial Constraints? (Testing with Sokoban)

Reddit r/LocalLLaMA ↗ · 2026-06-03

A benchmark tests LLMs on strict Sokoban puzzles with formatting constraints, finding only ChatGPT, Qwen3.7-max, and Gemini 3.5-thinking succeed, while others fail due to illegal moves or formatting errors.

0 favorites 0 likes

#spatial-reasoning

Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning

arXiv cs.LG ↗ · 2026-06-03 Cached

Proposes SpecFlow, a lightweight multimodal spatial reasoning framework that represents intermediate visual thoughts in a fixed-size discrete cosine space, reducing computation and KV cache costs by up to 2.1 times while maintaining competitive performance.

0 favorites 0 likes

#spatial-reasoning

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Hugging Face Daily Papers ↗ · 2026-06-03 Cached

Imaginative Perception Tokens (IPT) enhance vision-language models' spatial reasoning by externalizing intermediate perceptual representations from alternative viewpoints, outperforming traditional text-based reasoning on perspective taking, path tracing, and multiview counting tasks.

0 favorites 0 likes

#spatial-reasoning

GridVQA-X: A Framework for Evaluating Multimodal Explainability Methods

Hugging Face Daily Papers ↗ · 2026-06-02 Cached

GridVQA-X introduces a diagnostic framework to evaluate cross-modal explainability by distinguishing genuine spatial-relational reasoning from cross-modal shortcuts in multimodal models.

0 favorites 0 likes

spatial-reasoning

Submit Feedback