SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Summary
SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.
View Cached Full Text
Cached at: 05/25/26, 02:35 AM
Paper page - SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Source: https://huggingface.co/papers/2605.23904 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.
Agent skillstoday are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllabletext-space optimizerforagent skills: a separate optimizer model turns scoredrolloutsinto boundedadd/delete/replace editson a single skill document, and an edit is accepted only when it strictly improves a held-outvalidation score. Atextual learning-rate budget,rejected-edit buffer, andepoch-wise slow/meta updatemakeskill trainingstable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code.Transfer experimentsfurther show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.23904
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.23904 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.23904 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.23904 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@Yif_Yang: Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language …
Introducing SkillOpt, an optimizer that treats natural-language skills as trainable external parameters instead of finetuning model weights. It uses bounded edits and validation gating to enable stable, controllable skill updates, achieving best or tied-best results across 52 settings on 6 benchmarks with 7 models.
@omarsar0: New research from Microsoft Research I see a lot of AI engineers handwriting agent skill docs and hope they generalize.…
Microsoft Research introduces SkillOpt, a method that treats agent skill documents as trainable external state, using an optimizer model to make bounded edits validated by a held-out set. The approach achieves best or tied results across 52 evaluation cells and improves accuracy by over 23 points on GPT-5.5, with zero extra inference cost and transferable skills.
SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery
A new paper formalizes skill optimization for agents by treating markdown skill files as trainable parameters, using bounded edits validated against holdout sets. The approach transfers well between models and improves performance on procedural benchmarks.
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
SkillFlow introduces a benchmark of 166 tasks across 20 families for evaluating autonomous agents' ability to discover, repair, and maintain skills over time through a lifelong learning protocol. Experiments reveal a substantial capability gap among leading models, with Claude Opus 4.6 improving significantly while others show limited or negative gains from skill evolution.
SkillOS: Learning Skill Curation for Self-Evolving Agents
This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.