SkillOS: Learning Skill Curation for Self-Evolving Agents
Summary
This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.
View Cached Full Text
Cached at: 05/08/26, 07:26 AM
Paper page - SkillOS: Learning Skill Curation for Self-Evolving Agents
Source: https://huggingface.co/papers/2605.06614 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
SkillOS enables self-evolving LLM agents to learn complex long-term skill curation policies through reinforcement learning, improving performance across diverse tasks while generalizing across different executor architectures.
LLM-based agentsare increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-qualityskill curationserves as the key bottleneck. Existing approaches either rely on manualskill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learningskill curationinself-evolving agents. SkillOS pairs a frozenagent executorthat retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we designcomposite rewardsand train on groupedtask streamsbased on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-levelmeta-skillsover time.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.06614
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.06614 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.06614 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.06614 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Google's SkillOS for Self-Evolving AI Agents (22 minute read)
Google Cloud AI Research introduces SkillOS, a reinforcement learning framework enabling LLM-based agents to self-evolve by curating reusable skills from past experiences.
OpenSkill: Open-World Self-Evolution for LLM Agents
OpenSkill is a framework for LLM agents to self-evolve skills and verification signals from open-world resources without target-task supervision, achieving high performance across benchmarks.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
Skill1 is a unified framework that trains a single policy to co-evolve skill selection, utilization, and distillation using a shared task-outcome objective. Experiments on ALFWorld and WebShop show it outperforms existing baselines in complex task environments.
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
This paper introduces SkillMaster, a training framework that enables LLM agents to autonomously create, refine, and select skills through trajectory-informed review and counterfactual utility evaluation.
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
SkillFlow introduces a benchmark of 166 tasks across 20 families for evaluating autonomous agents' ability to discover, repair, and maintain skills over time through a lifelong learning protocol. Experiments reveal a substantial capability gap among leading models, with Claude Opus 4.6 improving significantly while others show limited or negative gains from skill evolution.