HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness
Summary
HeavySkill is a new framework that internalizes complex reasoning as an intrinsic model skill through parallel reasoning and summarization stages, outperforming traditional orchestration methods and enabling self-evolving LLMs via reinforcement learning.
View Cached Full Text
Cached at: 05/08/26, 08:55 AM
Paper page - HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness
Source: https://huggingface.co/papers/2605.02396 Authors:
,
,
,
,
,
,
,
,
,
Abstract
HeavySkill presents a framework where complex reasoning is internalized as an intrinsic model skill rather than relying on external orchestration, demonstrating superior performance through parallel reasoning and summarization stages that can be enhanced via reinforcement learning.
Recent advances inagentic harnesswithorchestration frameworksthat coordinate multiple agents withmemory,skills, andtool usehave achieved remarkable success incomplex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we proposeHeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model’s parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e.,parallel reasoningthensummarization, which can operate beneath anyagentic harness. We present a systematic empirical study ofHeavySkillacross diverse domains. Our results show that this inner skill consistently outperforms traditionalBest-of-N(BoN) strategies; notably, stronger LLMs can even approachPass@Nperformance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled viareinforcement learning, offering a promising path towardself-evolving LLMsthat internalize complex reasoning without relying on brittle orchestration layers.
View arXiv pageView PDFProject pageGitHub63Add to collection
Get this paper in your agent:
hf papers read 2605\.02396
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.02396 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.02396 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.02396 in a Space README.md to link it from this page.
Collections including this paper2
Similar Articles
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning
Skill0.5 is a novel agentic reinforcement learning framework that combines general skill internalization with task-specific skill utilization via a dynamic difficulty-aware router, improving out-of-distribution generalization in complex task environments as demonstrated on ALFWorld and WebShop.
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
This paper introduces SkillMaster, a training framework that enables LLM agents to autonomously create, refine, and select skills through trajectory-informed review and counterfactual utility evaluation.
@dair_ai: // Evolving Meta-Skill for Multi-Agent Systems // Can a multi-agent system get better at orchestration without touching…
Skill-MAS introduces a method for evolving meta-skills in multi-agent systems to improve orchestration without modifying model weights, achieving transferable performance gains across tasks and LLMs.
SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration
SkillFlow proposes a flow-driven recursive skill evolution framework for LLM-based agentic orchestration, using Tempered Trajectory Balance to prevent strategy collapse and provide transparent credit assignment. Experiments on 14 datasets show significant improvements over baselines in QA, math, code, and decision-making tasks.
Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning
Proposes SelSkill, a dual-granularity preference-learning framework that learns when to invoke skills in agentic tasks, improving task success by 10.9% on ALFWorld and 5.7% on BFCL.