Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Hugging Face Daily Papers 06/05/26, 04:33 PM Papers

3d-spatial-reasoning scene-aware-skills tool-use agentic-ai self-evolving memory-libraries

Summary

Skill-3D is a framework that enables AI agents to learn scene-aware skills through self-evolving memory and skill libraries, significantly improving tool utilization in 3D spatial reasoning tasks (e.g., from 39% to 78% on VSI-Bench).

This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenarios, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that 3D spatial reasoning tasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than selecting tools according to the specific scene and task. To address this, we propose Skill-3D, a framework that learns self-evolving scene-aware skills. Specifically, Skill-3D identifies the task scene and records the agent's tool-use trajectory into a Scene Memory, where successful trajectories from similar scenes are aggregated and distilled into a reusable scene-aware skill, with failed ones attached to the skill as lessons. During training, once a similar scene recurs, the corresponding skill is injected to guide the agent, producing new trajectories whose successes and failures further refine the skill, forming a loop in which the memory and the skill library co-evolve. Experiments show that Skill-3D substantially improves tool utilization in 3D spatial reasoning (from 39% to 78% on VSI-Bench), driving the agent toward correct and sufficient tool use. For instance, it improves Gemini-3-Flash by 67% on MMSI-Bench. Furthermore, we conduct agentic post-training over skill-guided trajectories, which boosts Qwen3-VL-8B by 43% on VSI-Bench.

Original Article

View Cached Full Text

Cached at: 06/10/26, 12:13 AM

Paper page - Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Source: https://huggingface.co/papers/2606.07436

Abstract

Skill-3D framework enables agents to learn scene-aware skills through self-evolving memory and skill libraries, improving tool utilization in 3D spatial reasoning tasks.

This paper exploresagentic 3D spatial understanding, i.e.,MLLM agentsperforming 3D reasoning throughtool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenarios, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that3D spatial reasoningtasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than selecting tools according to the specific scene and task. To address this, we propose Skill-3D, a framework that learnsself-evolving scene-aware skills. Specifically, Skill-3D identifies the task scene and records the agent’s tool-use trajectory into aScene Memory, where successful trajectories from similar scenes are aggregated and distilled into a reusable scene-aware skill, with failed ones attached to the skill as lessons. During training, once a similar scene recurs, the corresponding skill is injected to guide the agent, producing new trajectories whose successes and failures further refine the skill, forming a loop in which the memory and theskill libraryco-evolve. Experiments show that Skill-3D substantially improves tool utilization in3D spatial reasoning(from 39% to 78% on VSI-Bench), driving the agent toward correct and sufficienttool use. For instance, it improves Gemini-3-Flash by 67% on MMSI-Bench. Furthermore, we conductagentic post-trainingover skill-guided trajectories, which boosts Qwen3-VL-8B by 43% on VSI-Bench.

View arXiv page View PDF Project page GitHub6 Add to collection

Get this paper in your agent:

hf papers read 2606\.07436

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.07436 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.07436 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.07436 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Paper page - Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Submit Feedback

Similar Articles

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning