HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

Papers with Code Trending Papers

Summary

HeavySkill is a new framework that internalizes complex reasoning as an intrinsic model skill through parallel reasoning and summarization stages, outperforming traditional orchestration methods and enabling self-evolving LLMs via reinforcement learning.

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.
Original Article
View Cached Full Text

Cached at: 05/08/26, 08:55 AM

Paper page - HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

Source: https://huggingface.co/papers/2605.02396 Authors:

,

,

,

,

,

,

,

,

,

Abstract

HeavySkill presents a framework where complex reasoning is internalized as an intrinsic model skill rather than relying on external orchestration, demonstrating superior performance through parallel reasoning and summarization stages that can be enhanced via reinforcement learning.

Recent advances inagentic harnesswithorchestration frameworksthat coordinate multiple agents withmemory,skills, andtool usehave achieved remarkable success incomplex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we proposeHeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model’s parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e.,parallel reasoningthensummarization, which can operate beneath anyagentic harness. We present a systematic empirical study ofHeavySkillacross diverse domains. Our results show that this inner skill consistently outperforms traditionalBest-of-N(BoN) strategies; notably, stronger LLMs can even approachPass@Nperformance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled viareinforcement learning, offering a promising path towardself-evolving LLMsthat internalize complex reasoning without relying on brittle orchestration layers.

View arXiv pageView PDFProject pageGitHub63Add to collection

Get this paper in your agent:

hf papers read 2605\.02396

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.02396 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.02396 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.02396 in a Space README.md to link it from this page.

Collections including this paper2

Similar Articles

SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

arXiv cs.AI

SkillFlow proposes a flow-driven recursive skill evolution framework for LLM-based agentic orchestration, using Tempered Trajectory Balance to prevent strategy collapse and provide transparent credit assignment. Experiments on 14 datasets show significant improvements over baselines in QA, math, code, and decision-making tasks.