Google's SkillOS for Self-Evolving AI Agents (22 minute read)

TLDR AI Papers

Summary

Google Cloud AI Research introduces SkillOS, a reinforcement learning framework enabling LLM-based agents to self-evolve by curating reusable skills from past experiences.

SkillOS introduced a reinforcement learning framework that trains agents to curate reusable skills from past experience. The system improved long-horizon task performance by evolving structured skill repositories that generalized across models and domains.
Original Article
View Cached Full Text

Cached at: 05/11/26, 06:34 PM

# SkillOS: Learning Skill Curation for Self-Evolving Agents
Source: [https://arxiv.org/html/2605.06614](https://arxiv.org/html/2605.06614)
\\pdftrailerid

redacted\\correspondingauthorsiruo2@illinois\.edu, \{junyann, chenyulee\}@google\.com

Jun YanYanfei ChenGoogle Cloud AI ResearchRujun HanGoogle Cloud AI ResearchZifeng WangGoogle Cloud AI ResearchBhavana Dalvi MishraGoogle Cloud AI ResearchRui MengGoogle Cloud AI ResearchChun\-Liang LiGoogle Cloud AI ResearchYizhu JiaoUniversity of Illinois Urbana\-ChampaignKaiwen ZhaMassachusetts Institute of TechnologyMaohao ShenMassachusetts Institute of TechnologyVishy TirumalashettyGoogle Cloud AI ResearchGeorge LeeGoogle Cloud AI ResearchJiawei HanUniversity of Illinois Urbana\-ChampaignTomas PfisterGoogle Cloud AI ResearchChen\-Yu Lee

###### Abstract

LLM\-based agents are increasingly deployed to handle streaming tasks, yet they often remain one\-off problem solvers that fail to learn from past interactions\. Reusable skills distilled from experience provide a natural substrate for self\-evolution, where high\-quality skill curation serves as the key bottleneck\. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short\-horizon skill adaptation, but still struggle to learn complex long\-term curation policies from indirect and delayed feedback\. We proposeSkillOS, an experience\-driven RL training recipe for learning skill curation in self\-evolving agents\.SkillOSpairs a frozenagent executorthat retrieves and applies skills with a trainableskill curatorthat updates an externalSkillRepofrom accumulated experience\. To provide learning signals for curation, we train on grouped task streams based on skill\-relevant task dependencies, where earlier trajectories update theSkillRepo, and later related tasks evaluate these updates\. We further design composite rewards to better attribute downstream executor feedback to curation decisions\. Across multi\-turn agentic tasks and single\-turn reasoning tasks,SkillOSconsistently outperforms memory\-free and strong memory\-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains\. Further analyses show that the learned curator produces more targeted skill use, while the evolvingSkillRepodevelops richer internal structure and higher\-level meta\-skills over time\.

## 1Introduction

LLM\-based agents\(DBLP:journals/fcsc/WangMFZYZCTCLZWW24\)are increasingly deployed in real\-world scenarios, where they must move beyond instantaneous problem\-solving toward long\-term proficiency\(he2026memoryarena\)\. However, the prevailing paradigm of “one\-off” task execution limits their utility in streaming settings, where tasks unfold sequentially over time\. This makes*self\-evolution*\(fang2025comprehensive;gao2025survey\)essential: capable agents should not repeatedly start from scratch, but instead continually accumulate, refine, and reuse experience for future tasks\.

A key substrate for self\-evolution is*procedural memory*\(hu2025memory;wu2025human;DBLP:journals/corr/abs\-2508\-06433\), specifically, reusable skills\(anthropic\_skills\_2025;wang2025inducing\)accumulated from past interactions\. In real\-world streaming settings\(wu2024streambench\), a skill\-based self\-evolving agent typically follows a closed\-loop workflow: for each new task, it selects relevant skills, uses them to guide execution, and updates its skill collection based on the resulting trajectory\. This makes skill curation—the extraction of high\-quality lessons and their integration into the skill collection—essential for self\-evolving agents\.

However, existing skill curation works remain limited\. Manually curated skills, such as Anthropic’s skills repository\(anthropic\_skills\_2025\), demand huge human expertise and cannot scale to the diversity of tasks that agents may encounter\. Prompting or heuristic\-based methods that dictate memory operations\(xu2025amem;qiu2025alita;DBLP:journals/corr/abs\-2504\-07079\)rely on fixed rules and lack downstream performance feedback, preventing them from adapting to the executor’s actual needs\. Recent studies explored reinforcement learning \(RL\) to optimize skill\-based agent systems\. However, they either focus on teaching agents touseskills\(xia2026skillrl;tu2026dynamic\)or optimize skill operations within a short task stream\(DBLP:journals/corr/abs\-2512\-17102;DBLP:journals/corr/abs\-2602\-10652\)\. This limits the density of learning signals available for curating highly reusable skills and mastering complex management operations such as skill update and deletion, which are essential for robust and scalable long\-term self\-evolution\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x1.png)Figure 1:SkillOSpairs a frozenAgent Executorwith a trainableSkill Curator\. The executor retrieves relevant skills fromSkillRepoto act; the curator edits the repo \(insert/update/delete\) based on the resulting experiences, with Markdown as the skill format\.To tackle this challenge, we proposeSkillOS, an experience\-driven RL training recipe to learn the capability of skill curation for self\-evolving agents\. We study skill curation in a modular multi\-agent framework in a streaming setting, where a frozen*agent executor*solves tasks with a skill collection \(termedSkillRepo\), while a trainable*skill curator*updates and manages this collection through function calls \(Figure[1](https://arxiv.org/html/2605.06614#S1.F1)\(a\)\)\. We represent skills as Markdown files\(anthropic\_skills\_2025\)\(Figure[1](https://arxiv.org/html/2605.06614#S1.F1)\(b\)\) managed via file I/O operations similar to an operating system \(OS\)\. Our recipe features two core designs\.First, we construct each training instance as a group of related tasks\. By mimicking test\-time streaming settings, it grounds skill curation in long\-term utility: skills induced from earlier experiences are evaluated by their ability to improve later related tasks\.Second, we design rewards to better attribute environmental feedback to curation decisions, combining task performance with signals for valid function calls, skill quality, andSkillRepo’s compactness\. Together, these designs turn delayed and indirect supervision into learning signals for skill curation\.

We evaluateSkillOSon both multi\-turn agentic tasks and single\-turn reasoning tasks\. Experiments show thatSkillOSconsistently outperforms memory\-free and strong memory\-based methods in both effectiveness and efficiency, with up to\+9\.8%\+9\.8\\%relative performance improvement and−6\.0%\-6\.0\\%fewer interaction steps compared to the strongest baseline \(Table[1](https://arxiv.org/html/2605.06614#S4.T1)\)\. Our trained skill curator generalizes well across executors and tasks, improving performance even with the Gemini\-2\.5\-Pro executor\. Notably, our 8B curator also outperforms Gemini\-2\.5\-Pro when used directly as the curator\. Beyond performance gains, our analyses further show that the learned skill curator leads to more targeted and effective skill utilization, while the skills inSkillRepoevolve into more richly structured Markdown files that encode higher\-level meta\-skills over time\. Together, we establishSkillOSas a practical, modular, and experience\-driven RL training recipe for building self\-evolving agents\.

## 2Related Work

Memory for Self\-Evolving Agents\.Learning from past experiences as procedural memory\(wu2025human;wei2025evo;shen2026decocted;hu2025memory;huang2026rethinking;zhang2024working\)is a central mechanism for developing self\-evolving agents\(gao2025survey;fang2025comprehensive\)\. The central challenge is to encode interaction histories into reusable and retrievable representations\. Case\-based representations are the most concrete form in this research line: they store experiences in minimally processed formats, allowing past histories to be replayed directly or reused as in\-context exemplars, such as raw trajectories\(zheng2023synapse;DBLP:journals/corr/abs\-2508\-16153;wu2025comemagent\)and abstracted query–response pairs\(zhao2024expel;islam\-etal\-2024\-mapcoder\)\. Another line of work abstracts experiences into higher\-level knowledge that is editable, auditable, and composable, reducing reliance on long trajectory replay and improving both cross\-task generalization and efficiency\. Such strategy\-based memory typically consists of reusable workflows\(wang2025agent;DBLP:journals/corr/abs\-2507\-06229\), distilled insights\(ouyang2026reasoningbank;huang\-etal\-2025\-r2d2;DBLP:journals/corr/abs\-2509\-04439\), and recurring patterns\(yang2024buffer;kim\-etal\-2025\-principles\)\. Recently, skills\(wang2025inducing;kuroki2025agent;DBLP:journals/corr/abs\-2602\-08004;DBLP:journals/corr/abs\-2602\-12670;DBLP:journals/corr/abs\-2602\-02474;yang2026autoskillexperiencedrivenlifelonglearning;alzubi2026evoskill;liang2026skillnet\)have emerged as a new agent\-native form of memory and an orchestrable capability layer, owing to their modularity and ease of customization\. Anthropic conceptualizes each skill as a folder containing instructions, scripts, and supporting resources\(anthropic\_agent\_skills\_overview\), which has become the most widely adopted design in the current community\. Our work follows this design philosophy, simplifying the setting for research purposes by representing each skill as a singleMarkdownfile\.

Learning Memory and Skill Curation with RL\.Training LLM\-based agent systems with memory capabilities using RL has become a growing research direction\. One research line targets training for long\-context management with predefined operations such as compaction\(zhou2026mem;yu2026memagent;wang2025mem\)\. Another interesting area focuses more on memory utilization and management by learning additional memory tool\-calls\(DBLP:journals/corr/abs\-2508\-19828;DBLP:journals/corr/abs\-2508\-16629;DBLP:journals/corr/abs\-2510\-12635\)or training policies for different stages, such as memory retrieval\(zhang2026memrl\)\. More recently, RL has been applied at various stages of agent skill development\. Specifically, SkillRL\(xia2026skillrl\)and D2Skill\(tu2026dynamic\)teach smaller models to use skills curated from powerful LLMs in an iterative manner\. ARISE\(Li2026ARISEAR\)trains a shared policy operating both as skill retriever and worker, with heuristics for skill management\. Recent studies have begun to train agents for memory or skill curation\(DBLP:journals/corr/abs\-2512\-17102;DBLP:journals/corr/abs\-2602\-10652\), but their supervision is mostly restricted to local adaptation within short task streams\. This favors immediately useful operations such as skill insertion, while offering limited signal for complex management operations, such as revising outdated skills and deleting harmful ones\.SkillOSinstead formulates skill curation as a long\-horizon, executor\-grounded learning problem\. We group related tasks into training instances and combine downstream task outcomes with intermediate rewards, turning delayed and indirect feedback into learning signals for skill curation\.

## 3Methodology

In this section, we first formalize the problem setting and introduce the multi\-agent modular design ofSkillOS\. We then detail the RL training recipe designed specifically for training the skill curator\.

### 3\.1Streaming Skill Curation with Multi\-Agent Modular Design

We consider astreamingtest\-time setting\(wu2024streambench\), where an LLM\-based agent is deployed to solve a sequence of tasks𝒟=\{x1,x2,…,xT\}\\mathcal\{D\}=\\\{x\_\{1\},x\_\{2\},\\dots,x\_\{T\}\\\}that arrive over time\. At each time stamptt, the agent must solve the current taskxtx\_\{t\}before observing future tasks, producing an execution trajectoryξt=\{o1,a1,…,on,an\}\\xi\_\{t\}=\\\{o\_\{1\},a\_\{1\},\\dots,o\_\{n\},a\_\{n\}\\\}, whereooandaadenote observations and actions, respectively\. This setting naturally captures the challenge of self\-evolving agents, where the system must distill useful experience from the trajectories of past interactions to improve performance on future tasks, and become more capable over time\. Figure[1](https://arxiv.org/html/2605.06614#S1.F1)\(a\)presents an overview of the system\.

Skill Repository\.We maintain an external skill repository𝒮t\\mathcal\{S\}\_\{t\}at time stamptt, which consists ofNtN\_\{t\}reusable skills𝒮t=\{st1,st2,…,stNt\}\\mathcal\{S\}\_\{t\}=\\\{s\_\{t\}^\{1\},s\_\{t\}^\{2\},\\dots,s\_\{t\}^\{N\_\{t\}\}\\\}\. Following the widely adoptedSKILL\.mdformat\(anthropic\_skills\_2025\), each skill is represented as a single Markdown file with two components as shown in Figure[1](https://arxiv.org/html/2605.06614#S1.F1)\(b\): \(i\)YAML frontmatter, which specifies the skill name and a natural\-language description of when the skill should be used, and \(ii\)Markdown instructions, which describe the executable knowledge, workflows, constraints, and reusable heuristics captured by the skill\.

Agent Executor\.Given a taskxtx\_\{t\}, a frozen agent executorπℒ\\pi\_\{\\mathcal\{L\}\}solves the task conditioning on the current environment observation and relevant skills\. Specifically, we retrieve a subset of skills𝒮~t⊆𝒮t\\tilde\{\\mathcal\{S\}\}\_\{t\}\\subseteq\\mathcal\{S\}\_\{t\}using BM25\(robertson2009probabilistic\)for each taskxtx\_\{t\}, and the executor samples actions followinga∼πℒ\(⋅∣xt,ot,𝒮~t\)a\\sim\\pi\_\{\\mathcal\{L\}\}\(\\cdot\\mid x\_\{t\},o\_\{t\},\\tilde\{\\mathcal\{S\}\}\_\{t\}\)\.

Skill Curator\.After the executor completes taskxtx\_\{t\}, the skill curatorπ𝒮\\pi\_\{\\mathcal\{S\}\}observes the trajectoryξt\\xi\_\{t\}, the self\-judged correctness of the answers/interactions𝟙ξt\\mathbbm\{1\}\_\{\\xi\_\{t\}\}, and a retrieved subset of related skills𝒮~t\\tilde\{\\mathcal\{S\}\}\_\{t\}\. It then generates a sequence of structured curation operationsct=\(ut1,…,utMt\)∼π𝒮\(⋅∣ξt,𝟙ξt,𝒮~t\)c\_\{t\}=\(u\_\{t\}^\{1\},\\dots,u\_\{t\}^\{M\_\{t\}\}\)\\sim\\pi\_\{\\mathcal\{S\}\}\(\\cdot\\mid\\xi\_\{t\},\\mathbbm\{1\}\_\{\\xi\_\{t\}\},\\tilde\{\\mathcal\{S\}\}\_\{t\}\), where each operationutmu\_\{t\}^\{m\}is one of\{insert\_skill,update\_skill,delete\_skill\}\\\{\\definecolor\{tcbcolback\}\{rgb\}\{0\.9821875,0\.9025,0\.94515625\}\\definecolor\{tcbcolupper\}\{rgb\}\{0\.723828125,0\.159375,0\.4615234375\}\\definecolor\{tcbcollower\}\{rgb\}\{0\.723828125,0\.159375,0\.4615234375\}\\hbox to51\.82pt\{\\vbox to10\.3pt\{\\pgfpicture\\makeatletter\\hbox\{\\thinspace\\lower 0\.0pt\\hbox to0\.0pt\{\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\definecolor\{pgfstrokecolor\}\{rgb\}\{0,0,0\}\\pgfsys@color@rgb@stroke\{0\}\{0\}\{0\}\\pgfsys@invoke\{ \}\\pgfsys@color@rgb@fill\{0\}\{0\}\{0\}\\pgfsys@invoke\{ \}\\pgfsys@setlinewidth\{\\the\\pgflinewidth\}\\pgfsys@invoke\{ \}\\nullfont\\hbox to0\.0pt\{\{\}\{\}\{\}\{\}\\pgfsys@beginscope\\pgfsys@invoke\{ \}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\\definecolor\[named\]\{pgffillcolor\}\{rgb\}\{0\.25,0\.25,0\.25\}\\pgfsys@color@gray@fill\{0\.25\}\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\\pgfsys@moveto\{0\.0pt\}\{1\.0pt\}\\pgfsys@lineto\{0\.0pt\}\{9\.29999pt\}\\pgfsys@curveto\{0\.0pt\}\{9\.85228pt\}\{0\.44771pt\}\{10\.29999pt\}\{1\.0pt\}\{10\.29999pt\}\\pgfsys@lineto\{50\.81941pt\}\{10\.29999pt\}\\pgfsys@curveto\{51\.3717pt\}\{10\.29999pt\}\{51\.81941pt\}\{9\.85228pt\}\{51\.81941pt\}\{9\.29999pt\}\\pgfsys@lineto\{51\.81941pt\}\{1\.0pt\}\\pgfsys@curveto\{51\.81941pt\}\{0\.44771pt\}\{51\.3717pt\}\{0\.0pt\}\{50\.81941pt\}\{0\.0pt\}\\pgfsys@lineto\{1\.0pt\}\{0\.0pt\}\\pgfsys@curveto\{0\.44771pt\}\{0\.0pt\}\{0\.0pt\}\{0\.44771pt\}\{0\.0pt\}\{1\.0pt\}\\pgfsys@closepath\\pgfsys@fill\\pgfsys@invoke\{ \}\\pgfsys@invoke\{ \}\\pgfsys@endscope\\pgfsys@beginscope\\pgfsys@invoke\{ \}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\\definecolor\[named\]\{pgffillcolor\}\{rgb\}\{0\.9821875,0\.9025,0\.94515625\}\\pgfsys@color@rgb@fill\{0\.9821875\}\{0\.9025\}\{0\.94515625\}\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\\pgfsys@moveto\{0\.0pt\}\{1\.0pt\}\\pgfsys@lineto\{0\.0pt\}\{9\.29999pt\}\\pgfsys@curveto\{0\.0pt\}\{9\.85228pt\}\{0\.44771pt\}\{10\.29999pt\}\{1\.0pt\}\{10\.29999pt\}\\pgfsys@lineto\{50\.81941pt\}\{10\.29999pt\}\\pgfsys@curveto\{51\.3717pt\}\{10\.29999pt\}\{51\.81941pt\}\{9\.85228pt\}\{51\.81941pt\}\{9\.29999pt\}\\pgfsys@lineto\{51\.81941pt\}\{1\.0pt\}\\pgfsys@curveto\{51\.81941pt\}\{0\.44771pt\}\{51\.3717pt\}\{0\.0pt\}\{50\.81941pt\}\{0\.0pt\}\\pgfsys@lineto\{1\.0pt\}\{0\.0pt\}\\pgfsys@curveto\{0\.44771pt\}\{0\.0pt\}\{0\.0pt\}\{0\.44771pt\}\{0\.0pt\}\{1\.0pt\}\\pgfsys@closepath\\pgfsys@fill\\pgfsys@invoke\{ \}\\pgfsys@invoke\{ \}\\pgfsys@endscope\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\pgfsys@transformcm\{1\.0\}\{0\.0\}\{0\.0\}\{1\.0\}\{4\.0pt\}\{3\.4pt\}\\pgfsys@invoke\{ \}\\hbox\{\{\\color\[rgb\]\{0\.723828125,0\.159375,0\.4615234375\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.723828125,0\.159375,0\.4615234375\}\\hbox\{\\set@color\{\\color\[rgb\]\{0\.723828125,0\.159375,0\.4615234375\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.723828125,0\.159375,0\.4615234375\}\\scriptsize\\ignorespaces insert\\\_skill\}\}\}\}\\pgfsys@invoke\{ \}\\pgfsys@endscope\}\\pgfsys@invoke\{ \}\\pgfsys@endscope\{\}\{\}\{\}\\hss\}\\pgfsys@discardpath\\pgfsys@invoke\{ \}\\pgfsys@endscope\\hss\}\}\\endpgfpicture\}\},\\definecolor\{tcbcolback\}\{rgb\}\{0\.89125,0\.94,0\.90625\}\\definecolor\{tcbcolupper\}\{rgb\}\{0\.0796875,0\.425,0\.1859375\}\\definecolor\{tcbcollower\}\{rgb\}\{0\.0796875,0\.425,0\.1859375\}\\hbox to56\.66pt\{\\vbox to10\.3pt\{\\pgfpicture\\makeatletter\\hbox\{\\thinspace\\lower 0\.0pt\\hbox to0\.0pt\{\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\definecolor\{pgfstrokecolor\}\{rgb\}\{0,0,0\}\\pgfsys@color@rgb@stroke\{0\}\{0\}\{0\}\\pgfsys@invoke\{ \}\\pgfsys@color@rgb@fill\{0\}\{0\}\{0\}\\pgfsys@invoke\{ \}\\pgfsys@setlinewidth\{\\the\\pgflinewidth\}\\pgfsys@invoke\{ \}\\nullfont\\hbox to0\.0pt\{\{\}\{\}\{\}\{\}\\pgfsys@beginscope\\pgfsys@invoke\{ \}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\\definecolor\[named\]\{pgffillcolor\}\{rgb\}\{0\.25,0\.25,0\.25\}\\pgfsys@color@gray@fill\{0\.25\}\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\\pgfsys@moveto\{0\.0pt\}\{1\.0pt\}\\pgfsys@lineto\{0\.0pt\}\{9\.29999pt\}\\pgfsys@curveto\{0\.0pt\}\{9\.85228pt\}\{0\.44771pt\}\{10\.29999pt\}\{1\.0pt\}\{10\.29999pt\}\\pgfsys@lineto\{55\.66069pt\}\{10\.29999pt\}\\pgfsys@curveto\{56\.21298pt\}\{10\.29999pt\}\{56\.66069pt\}\{9\.85228pt\}\{56\.66069pt\}\{9\.29999pt\}\\pgfsys@lineto\{56\.66069pt\}\{1\.0pt\}\\pgfsys@curveto\{56\.66069pt\}\{0\.44771pt\}\{56\.21298pt\}\{0\.0pt\}\{55\.66069pt\}\{0\.0pt\}\\pgfsys@lineto\{1\.0pt\}\{0\.0pt\}\\pgfsys@curveto\{0\.44771pt\}\{0\.0pt\}\{0\.0pt\}\{0\.44771pt\}\{0\.0pt\}\{1\.0pt\}\\pgfsys@closepath\\pgfsys@fill\\pgfsys@invoke\{ \}\\pgfsys@invoke\{ \}\\pgfsys@endscope\\pgfsys@beginscope\\pgfsys@invoke\{ \}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\\definecolor\[named\]\{pgffillcolor\}\{rgb\}\{0\.89125,0\.94,0\.90625\}\\pgfsys@color@rgb@fill\{0\.89125\}\{0\.94\}\{0\.90625\}\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\\pgfsys@moveto\{0\.0pt\}\{1\.0pt\}\\pgfsys@lineto\{0\.0pt\}\{9\.29999pt\}\\pgfsys@curveto\{0\.0pt\}\{9\.85228pt\}\{0\.44771pt\}\{10\.29999pt\}\{1\.0pt\}\{10\.29999pt\}\\pgfsys@lineto\{55\.66069pt\}\{10\.29999pt\}\\pgfsys@curveto\{56\.21298pt\}\{10\.29999pt\}\{56\.66069pt\}\{9\.85228pt\}\{56\.66069pt\}\{9\.29999pt\}\\pgfsys@lineto\{56\.66069pt\}\{1\.0pt\}\\pgfsys@curveto\{56\.66069pt\}\{0\.44771pt\}\{56\.21298pt\}\{0\.0pt\}\{55\.66069pt\}\{0\.0pt\}\\pgfsys@lineto\{1\.0pt\}\{0\.0pt\}\\pgfsys@curveto\{0\.44771pt\}\{0\.0pt\}\{0\.0pt\}\{0\.44771pt\}\{0\.0pt\}\{1\.0pt\}\\pgfsys@closepath\\pgfsys@fill\\pgfsys@invoke\{ \}\\pgfsys@invoke\{ \}\\pgfsys@endscope\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\pgfsys@transformcm\{1\.0\}\{0\.0\}\{0\.0\}\{1\.0\}\{4\.0pt\}\{3\.4pt\}\\pgfsys@invoke\{ \}\\hbox\{\{\\color\[rgb\]\{0\.0796875,0\.425,0\.1859375\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.0796875,0\.425,0\.1859375\}\\hbox\{\\set@color\{\\color\[rgb\]\{0\.0796875,0\.425,0\.1859375\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.0796875,0\.425,0\.1859375\}\\scriptsize\\ignorespaces update\\\_skill\}\}\}\}\\pgfsys@invoke\{ \}\\pgfsys@endscope\}\\pgfsys@invoke\{ \}\\pgfsys@endscope\{\}\{\}\{\}\\hss\}\\pgfsys@discardpath\\pgfsys@invoke\{ \}\\pgfsys@endscope\\hss\}\}\\endpgfpicture\}\},\\definecolor\{tcbcolback\}\{rgb\}\{0\.99390625,0\.9521875,0\.88\}\\definecolor\{tcbcolupper\}\{rgb\}\{0\.8068359375,0\.511328125,0\}\\definecolor\{tcbcollower\}\{rgb\}\{0\.8068359375,0\.511328125,0\}\\hbox to52\.79pt\{\\vbox to10\.3pt\{\\pgfpicture\\makeatletter\\hbox\{\\thinspace\\lower 0\.0pt\\hbox to0\.0pt\{\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\definecolor\{pgfstrokecolor\}\{rgb\}\{0,0,0\}\\pgfsys@color@rgb@stroke\{0\}\{0\}\{0\}\\pgfsys@invoke\{ \}\\pgfsys@color@rgb@fill\{0\}\{0\}\{0\}\\pgfsys@invoke\{ \}\\pgfsys@setlinewidth\{\\the\\pgflinewidth\}\\pgfsys@invoke\{ \}\\nullfont\\hbox to0\.0pt\{\{\}\{\}\{\}\{\}\\pgfsys@beginscope\\pgfsys@invoke\{ \}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\\definecolor\[named\]\{pgffillcolor\}\{rgb\}\{0\.25,0\.25,0\.25\}\\pgfsys@color@gray@fill\{0\.25\}\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\\pgfsys@moveto\{0\.0pt\}\{1\.0pt\}\\pgfsys@lineto\{0\.0pt\}\{9\.29999pt\}\\pgfsys@curveto\{0\.0pt\}\{9\.85228pt\}\{0\.44771pt\}\{10\.29999pt\}\{1\.0pt\}\{10\.29999pt\}\\pgfsys@lineto\{51\.785pt\}\{10\.29999pt\}\\pgfsys@curveto\{52\.3373pt\}\{10\.29999pt\}\{52\.785pt\}\{9\.85228pt\}\{52\.785pt\}\{9\.29999pt\}\\pgfsys@lineto\{52\.785pt\}\{1\.0pt\}\\pgfsys@curveto\{52\.785pt\}\{0\.44771pt\}\{52\.3373pt\}\{0\.0pt\}\{51\.785pt\}\{0\.0pt\}\\pgfsys@lineto\{1\.0pt\}\{0\.0pt\}\\pgfsys@curveto\{0\.44771pt\}\{0\.0pt\}\{0\.0pt\}\{0\.44771pt\}\{0\.0pt\}\{1\.0pt\}\\pgfsys@closepath\\pgfsys@fill\\pgfsys@invoke\{ \}\\pgfsys@invoke\{ \}\\pgfsys@endscope\\pgfsys@beginscope\\pgfsys@invoke\{ \}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\{\}\\definecolor\[named\]\{pgffillcolor\}\{rgb\}\{0\.99390625,0\.9521875,0\.88\}\\pgfsys@color@rgb@fill\{0\.99390625\}\{0\.9521875\}\{0\.88\}\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\{\{\}\{\}\{\{\}\}\}\{\{\}\{\}\{\{\}\}\}\{\}\{\}\\pgfsys@moveto\{0\.0pt\}\{1\.0pt\}\\pgfsys@lineto\{0\.0pt\}\{9\.29999pt\}\\pgfsys@curveto\{0\.0pt\}\{9\.85228pt\}\{0\.44771pt\}\{10\.29999pt\}\{1\.0pt\}\{10\.29999pt\}\\pgfsys@lineto\{51\.785pt\}\{10\.29999pt\}\\pgfsys@curveto\{52\.3373pt\}\{10\.29999pt\}\{52\.785pt\}\{9\.85228pt\}\{52\.785pt\}\{9\.29999pt\}\\pgfsys@lineto\{52\.785pt\}\{1\.0pt\}\\pgfsys@curveto\{52\.785pt\}\{0\.44771pt\}\{52\.3373pt\}\{0\.0pt\}\{51\.785pt\}\{0\.0pt\}\\pgfsys@lineto\{1\.0pt\}\{0\.0pt\}\\pgfsys@curveto\{0\.44771pt\}\{0\.0pt\}\{0\.0pt\}\{0\.44771pt\}\{0\.0pt\}\{1\.0pt\}\\pgfsys@closepath\\pgfsys@fill\\pgfsys@invoke\{ \}\\pgfsys@invoke\{ \}\\pgfsys@endscope\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\pgfsys@fill@opacity\{1\.0\}\\pgfsys@invoke\{ \}\{\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\{\{\}\}\\pgfsys@beginscope\\pgfsys@invoke\{ \}\\pgfsys@transformcm\{1\.0\}\{0\.0\}\{0\.0\}\{1\.0\}\{4\.0pt\}\{3\.4pt\}\\pgfsys@invoke\{ \}\\hbox\{\{\\color\[rgb\]\{0\.8068359375,0\.511328125,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.8068359375,0\.511328125,0\}\\hbox\{\\set@color\{\\color\[rgb\]\{0\.8068359375,0\.511328125,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.8068359375,0\.511328125,0\}\\scriptsize\\ignorespaces delete\\\_skill\}\}\}\}\\pgfsys@invoke\{ \}\\pgfsys@endscope\}\\pgfsys@invoke\{ \}\\pgfsys@endscope\{\}\{\}\{\}\\hss\}\\pgfsys@discardpath\\pgfsys@invoke\{ \}\\pgfsys@endscope\\hss\}\}\\endpgfpicture\}\}\\\}\. Each operation is implemented as a function call \(detailed signature in Figure[8](https://arxiv.org/html/2605.06614#A1.F8)\) that manipulates the skill repository𝒮t\\mathcal\{S\}\_\{t\}\. Applying these operations transforms the repository from𝒮t\\mathcal\{S\}\_\{t\}to𝒮t\+1\\mathcal\{S\}\_\{t\+1\}as𝒮t\+1=ApplyOps​\(𝒮t,ct\)\\mathcal\{S\}\_\{t\+1\}=\\textsc\{ApplyOps\}\(\\mathcal\{S\}\_\{t\},c\_\{t\}\)\. The updated repository is then used by the executor on subsequent tasks, forming a closed loop between task execution and experience\-driven skill evolution\.

### 3\.2Learning Skill Curation with RL

We optimize the skill curatorπ𝒮\\pi\_\{\\mathcal\{S\}\}with RL and keep the agent executorπℒ\\pi\_\{\\mathcal\{L\}\}frozen\. The main challenge is indirect and delayed feedback for curation decisions, which is only revealed throughπℒ\\pi\_\{\\mathcal\{L\}\}’s performance on future relevant tasks\. We address this by constructing grouped training instances \(§[3\.2\.1](https://arxiv.org/html/2605.06614#S3.SS2.SSS1)\) and designing a composite reward \(§[3\.2\.2](https://arxiv.org/html/2605.06614#S3.SS2.SSS2)\) that combines future task outcomes with intermediate signals on operation validity, skill quality, and the conciseness of skills\. An overview of the training process is shown in Figure[2](https://arxiv.org/html/2605.06614#S3.F2)\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x2.png)Figure 2:SkillOStraining pipeline\. Each training step samples a group of related tasks and initializes an emptySkillRepo\.π𝒮\\pi\_\{\\mathcal\{S\}\}is optimized with composite rewards, enabling self\-evolution\.#### 3\.2\.1Training Instance Construction

To provide downstream learning signals for skill curation, we construct each training instance as a group of related tasks that are solved sequentially\. Within each group,SkillRepois updated by the curatorπ𝓈\\pi\_\{\\mathcal\{s\}\}after each task, allowing skills derived from earlier experiences to be evaluated by whether they help solve related future tasks\. This also differs from prior work that focuses on short\-horizon transfer\(DBLP:journals/corr/abs\-2512\-17102;DBLP:journals/corr/abs\-2602\-10652\), where our grouped formulation exposes the curator to longer skill\-evolution trajectories and provides denser feedback for learning complex curation operations\.

Concretely, for each taskxix\_\{i\}in𝒟=\{xi\}i=1N\\mathcal\{D\}=\\\{x\_\{i\}\\\}\_\{i=1\}^\{N\}, we first annotate each instance with a set of skill\-relevant attributes\. Formally, for eachxix\_\{i\}, we use Gemini\-2\.5\-Pro\(DBLP:journals/corr/abs\-2507\-06261\)to produce a set of tags:

Zi=\{zi1,zi2,…,zi\|Zi\|\},Z\_\{i\}=\\\{z\_\{i\}^\{1\},z\_\{i\}^\{2\},\\dots,z\_\{i\}^\{\|Z\_\{i\}\|\}\\\},where each attributeziz\_\{i\}captures a salient aspect of the taskxix\_\{i\}, such as topic and common pitfalls\. For example, in mathematical reasoning, attributes may include labels such as “algebra” or “Fourier transformation”\. These attributes serve as proxies for task\-relatedness and potential skill dependency\.

Based on the annotated attributes, we then partition𝒟\\mathcal\{D\}into a collection ofMMtask groups using the similarity of attributes of these data samples:

𝒟=\{G1,G2,…,GM\},Gm=\{xm,1,xm,2,…,xm,\|Gm\|\},\\mathcal\{D\}=\\\{G\_\{1\},G\_\{2\},\\dots,G\_\{M\}\\\},\\qquad G\_\{m\}=\\\{x\_\{m,1\},x\_\{m,2\},\\dots,x\_\{m,\|G\_\{m\}\|\}\\\},where all instances within the same groupGmG\_\{m\}exhibit non\-trivial dependency in terms of required skills\. Detailed description of data processing and grouping algorithms can be found in Appendix[B\.2](https://arxiv.org/html/2605.06614#A2.SS2)\.

#### 3\.2\.2Training Loop and Policy Optimization

We employ Grouped Reward Policy Optimization \(GRPODBLP:journals/corr/abs\-2402\-03300\) for its training stability and sample efficiency\. The training loop shown in Algorithm[1](https://arxiv.org/html/2605.06614#alg1)optimizes the skill curator policyπ𝒮\\pi\_\{\\mathcal\{S\}\}to maximize a composite reward function over the distribution of generated traces\. For a task groupG=\(x1,…,x\|G\|\)G=\(x\_\{1\},\\dots,x\_\{\|G\|\}\), the curator produces a sequence of curation decisionsc=\(c1,…,c\|G\|\)c=\(c\_\{1\},\\dots,c\_\{\|G\|\}\)as the executor proceeds through the group\. Each training step, the reward combines four signals:

r=rtask⏟task outcome\+λf​rfc⏟function call\+λu​rcnt⏟content quality\+λc​rcomp⏟compressionr\\;=\\;\\underbrace\{r^\{\\text\{task\}\}\}\_\{\\text\{task outcome\}\}\+\\;\\lambda\_\{\\mathrm\{f\}\}\\underbrace\{r^\{\\text\{fc\}\}\}\_\{\\text\{function call\}\}\+\\;\\lambda\_\{\\mathrm\{u\}\}\\underbrace\{r^\{\\text\{cnt\}\}\}\_\{\\text\{content quality\}\}\+\\;\\lambda\_\{\\mathrm\{c\}\}\\underbrace\{r^\{\\text\{comp\}\}\}\_\{\\text\{compression\}\}\(1\)
Task outcome reward\.The first task uses an emptySkillRepo, before any curator update occurs\. We thus define the task outcome reward as the average success over the remaining tasks asrtask=1\|G\|−1​∑i=2\|G\|𝟙​\(ξi\)r^\{\\text\{task\}\}=\\frac\{1\}\{\|G\|\-1\}\\sum\_\{i=2\}^\{\|G\|\}\\mathbbm\{1\}\(\\xi\_\{i\}\), which provides executor\-grounded signal on downstream performance achieved by the evolvingSkillRepofromπ𝒮\\pi\_\{\\mathcal\{S\}\}\.

Function call reward\.The function call reward measures whether the curator produces valid skill operations\. For each curation decisioncic\_\{i\}, letValid​\(ci\)\\mathrm\{Valid\}\(c\_\{i\}\)be the fraction of generated function calls that are valid and successfully executed\. We define the function call reward asrfc=1\|G\|​∑i=1\|G\|Valid​\(ci\)r^\{\\text\{fc\}\}=\\frac\{1\}\{\|G\|\}\\sum\_\{i=1\}^\{\|G\|\}\\mathrm\{Valid\}\(c\_\{i\}\)\.

Algorithm 1Training Skill Curator with Task Groups using GRPO1:foreach training stepdo

2:

G=\(x1,…,x\|G\|\)G=\(x\_\{1\},\\dots,x\_\{\|G\|\}\),

𝒮←∅\\mathcal\{S\}\\leftarrow\\emptyset⊳\\trianglerightSample a task group and initialize SkillRepo

3:fortask index

i=1,…,\|G\|i=1,\\dots,\|G\|do

4:

𝒮~←BM25​\(xi,𝒮\)\\tilde\{\\mathcal\{S\}\}\\leftarrow\\textsc\{BM25\}\\\!\\left\(x\_\{i\},\\;\\mathcal\{S\}\\right\)⊳\\trianglerightRetrieve relevant skills

5:

ξi←RunTask​\(𝒮~,πℒ,xi\)\\xi\_\{i\}\\leftarrow\\textsc\{RunTask\}\\\!\\left\(\\tilde\{\\mathcal\{S\}\},\\;\\pi\_\{\\mathcal\{L\}\},\\;x\_\{i\}\\right\)⊳\\trianglerightRun inference on frozen executor

6:

ci∼π𝒮\(⋅\|ξi,𝒮~\)c\_\{i\}\\sim\\pi\_\{\\mathcal\{S\}\}\\\!\\left\(\\cdot\\;\\middle\|\\;\\xi\_\{i\},\\tilde\{\\mathcal\{S\}\}\\right\)⊳\\trianglerightSample a rollout from skill curator

7:

𝒮←ApplyOps​\(𝒮,ci\)\\mathcal\{S\}\\leftarrow\\textsc\{ApplyOps\}\\\!\\left\(\\mathcal\{S\},\\;c\_\{i\}\\right\)⊳\\trianglerightApplyinsert/update/delete

8:endfor

9:

r←CalculateReward​\(ξ,c\)r\\leftarrow\\textsc\{CalculateReward\}\(\\xi,c\)
10:

Update​π𝒮\\textsc\{Update\}\\ \\pi\_\{\\mathcal\{S\}\}⊳\\trianglerightUpdate skill curator using GRPO

11:endfor

Compression reward\.To discourage verbatim trajectory copying, we reward concise repository updates\. Let𝒮i\\mathcal\{S\}\_\{i\}denote the skill repository after applyingcic\_\{i\}, and letχi\\chi\_\{i\}denote the curator input context at positionii\. We definercomp=1\|G\|​∑i=1\|G\|\(1−\|𝒮i\|\|χi\|\)r^\{\\text\{comp\}\}=\\frac\{1\}\{\|G\|\}\\sum\_\{i=1\}^\{\|G\|\}\\left\(1\-\\frac\{\|\\mathcal\{S\}\_\{i\}\|\}\{\|\\chi\_\{i\}\|\}\\right\), where\|𝒮i\|\|\\mathcal\{S\}\_\{i\}\|and\|χi\|\|\\chi\_\{i\}\|denote token lengths\. This encourages the curator to distill reusable skills rather than store raw trajectories\.

Content quality reward\.The content quality reward evaluates whether the curated skills are semantically meaningful and likely to be useful for future tasks\. LetJudge​\(ci\)\\mathrm\{Judge\}\(c\_\{i\}\)denote the scalar score assigned by an external judge \(Qwen3\-32B\)cic\_\{i\}, we compute the reward asrcnt=1\|G\|​∑i=1\|G\|Judge​\(ci\)r^\{\\text\{cnt\}\}=\\frac\{1\}\{\|G\|\}\\sum\_\{i=1\}^\{\|G\|\}\\mathrm\{Judge\}\(c\_\{i\}\)\.

For each task groupGG, we sampleNNindependent rollouts of the*entire curation sequence*fromπ𝒮\\pi\_\{\\mathcal\{S\}\}\. Within each rollout, the executor produces trajectoryξi\\xi\_\{i\}using the skill repository𝒮i\\mathcal\{S\}^\{i\}resulting from previous curationsc<ic\_\{<i\}till task positioniiwith the same training task group, so different rollouts evolve different repository histories\. The GRPO advantage is computed as:An=rn−1N​∑n′=1Nrn′,A^\{n\}=r^\{n\}\-\\frac\{1\}\{N\}\\sum\_\{n^\{\\prime\}=1\}^\{N\}r^\{n^\{\\prime\}\},wherernr^\{n\}is the composite reward \(Eq\.[1](https://arxiv.org/html/2605.06614#S3.E1)\) for thenn\-th rollout\. We optimizeπ𝒮\\pi\_\{\\mathcal\{S\}\}with a clipped surrogate objective over all curation stepsi=1,…,\|G\|i=1,\\ldots,\|G\|:

ℒ=𝔼n​\[min⁡\(ρn​An,clip​\(ρn,1−ϵ,1\+ϵ\)​An\)\]\\mathcal\{L\}=\\mathbb\{E\}\_\{n\}\\\!\\left\[\\min\\\!\\left\(\\rho^\{n\}\\,A^\{n\},\\;\\mathrm\{clip\}\\\!\\left\(\\rho^\{n\},\\,1\{\-\}\\epsilon,\\,1\{\+\}\\epsilon\\right\)\\\!A^\{n\}\\right\)\\right\]\(2\)whereρn=π𝒮​\(cn∣χ\)/πθo​l​d​\(cn∣χ\)\\rho^\{n\}=\\pi\_\{\\mathcal\{S\}\}\(c^\{n\}\\mid\\chi\)\\,/\\,\\pi\_\{\\theta\_\{old\}\}\(c^\{n\}\\mid\\chi\)is the importance ratio\. The advantageAnA^\{n\}is assigned uniformly to all tokens incnc^\{n\}, and we discard the KL term in GRPO to encourage policy exploration\.

## 4Experiments

We conduct experiments on both multi\-turn agentic tasks and single\-turn reasoning tasks, in line with prior work\(xia2026skillrl;wei2025evo;DBLP:journals/corr/abs\-2602\-10652\)\. We additionally show that the trained skill curator transfers across agent executors and task domains, highlighting its flexibility and generalizability\.

Table 1:Experiment results on ALFWorld benchmark\. Success rate \(SR↑\\uparrow\) and the number of steps \(Steps↓\\downarrow\) are reported on 6 subsets with 3 different frozen executors\.MethodsCuratorPickLookCleanHeatCoolPick2Avg\. SRStepsπ𝒮\\pi\_\{\\mathcal\{S\}\}\(35\)\(13\)\(27\)\(16\)\(25\)\(24\)\(140\)Executorπℒ\\pi\_\{\\mathcal\{L\}\}: Qwen3\-8BNo MemoryNone78\.11\.678\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.6\}\}46\.27\.746\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.7\}\}33\.313\.433\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 13\.4\}\}37\.510\.837\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.8\}\}29\.36\.129\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.1\}\}47\.26\.447\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.4\}\}47\.91\.247\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.2\}\}21\.1ReasoningBank![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B83\.80\.083\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}48\.77\.248\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.2\}\}49\.416\.249\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 16\.2\}\}39\.64\.439\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.4\}\}41\.38\.541\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.5\}\}54\.28\.854\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.8\}\}55\.73\.155\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.1\}\}20\.1MemP![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B80\.05\.780\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.7\}\}43\.64\.443\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.4\}\}24\.74\.324\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.3\}\}33\.33\.633\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.6\}\}38\.76\.138\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.1\}\}48\.66\.448\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.4\}\}49\.70\.749\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.7\}\}21\.0SkillOS\-base![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B79\.08\.779\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.7\}\}41\.04\.441\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.4\}\}45\.74\.345\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.3\}\}37\.59\.537\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 9\.5\}\}38\.74\.038\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.0\}\}55\.62\.155\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.1\}\}53\.12\.553\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.5\}\}20\.4SkillOS\-gemini![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-Pro77\.16\.077\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.0\}\}53\.86\.153\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.1\}\}37\.06\.437\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.4\}\}37\.59\.537\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 9\.5\}\}36\.03\.236\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.2\}\}50\.06\.750\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.7\}\}50\.73\.650\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.6\}\}20\.8SkillOS![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/fire.png)Qwen3\-8B85\.73\.3\\textbf\{85\.7\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.3\}\}56\.47\.7\\textbf\{56\.4\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.7\}\}54\.38\.6\\textbf\{54\.3\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.6\}\}43\.89\.5\\textbf\{43\.8\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 9\.5\}\}46\.72\.3\\textbf\{46\.7\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.3\}\}62\.56\.4\\textbf\{62\.5\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.4\}\}61\.24\.6\\textbf\{61\.2\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.6\}\}18\.9Executorπℒ\\pi\_\{\\mathcal\{L\}\}: Qwen3\-32BNo MemoryNone80\.02\.980\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.9\}\}69\.20\.069\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}45\.67\.745\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.7\}\}37\.516\.537\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 16\.5\}\}42\.76\.142\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.1\}\}43\.12\.443\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.4\}\}54\.52\.554\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.5\}\}20\.3ReasoningBank![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B86\.73\.086\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.0\}\}71\.85\.471\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.4\}\}50\.66\.350\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.3\}\}45\.813\.345\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 13\.3\}\}52\.08\.952\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.9\}\}51\.45\.151\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}61\.42\.561\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.5\}\}18\.7MemP![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B80\.02\.980\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.9\}\}76\.90\.076\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}44\.47\.444\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.4\}\}37\.510\.837\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.8\}\}42\.72\.342\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.3\}\}47\.26\.447\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.4\}\}55\.73\.755\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.7\}\}20\.0SkillOS\-base![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B82\.92\.982\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.9\}\}69\.211\.869\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 11\.8\}\}48\.12\.148\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.1\}\}50\.09\.750\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 9\.7\}\}48\.014\.448\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 14\.4\}\}52\.811\.052\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 11\.0\}\}59\.83\.059\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.0\}\}19\.2SkillOS\-gemini![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-Pro97\.13\.0\\textbf\{97\.1\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.0\}\}76\.95\.476\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.4\}\}55\.66\.055\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.0\}\}43\.811\.343\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 11\.3\}\}40\.05\.740\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.7\}\}54\.24\.954\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.9\}\}63\.64\.263\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.2\}\}18\.1SkillOS![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/fire.png)Qwen3\-8B91\.43\.391\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.3\}\}76\.94\.4\\textbf\{76\.9\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.4\}\}59\.38\.6\\textbf\{59\.3\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.6\}\}56\.312\.5\\textbf\{56\.3\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 12\.5\}\}57\.310\.1\\textbf\{57\.3\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.1\}\}62\.54\.2\\textbf\{62\.5\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.2\}\}68\.65\.7\\textbf\{68\.6\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.7\}\}17\.3Executorπℒ\\pi\_\{\\mathcal\{L\}\}: Gemini\-2\.5\-proNo MemoryNone90\.53\.290\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.2\}\}66\.75\.166\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}48\.110\.248\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.2\}\}39\.617\.139\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 17\.1\}\}687\.468\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.4\}\}68\.13\.868\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.8\}\}66\.42\.066\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.0\}\}17\.7ReasoningBank![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B91\.43\.491\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.4\}\}61\.54\.161\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.1\}\}63\.09\.363\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 9\.3\}\}39\.610\.339\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.3\}\}70\.73\.270\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.2\}\}76\.48\.576\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.5\}\}71\.42\.971\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.9\}\}16\.0MemP![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B95\.22\.195\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.1\}\}74\.46\.8\\textbf\{74\.4\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.8\}\}61\.77\.661\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.6\}\}56\.312\.456\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 12\.4\}\}76\.06\.276\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.2\}\}68\.18\.568\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.5\}\}74\.33\.474\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.4\}\}15\.2SkillOS\-base![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B91\.41\.691\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.6\}\}69\.27\.769\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.7\}\}56\.85\.756\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.7\}\}54\.213\.754\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 13\.7\}\}72\.04\.072\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.0\}\}66\.711\.066\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 11\.0\}\}70\.73\.070\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.0\}\}16\.3SkillOS\-gemini![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-Pro94\.35\.794\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.7\}\}69\.20\.069\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}77\.85\.7\\textbf\{77\.8\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.7\}\}75\.016\.5\\textbf\{75\.0\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 16\.5\}\}80\.012\.280\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 12\.2\}\}66\.72\.466\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.4\}\}79\.32\.679\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.6\}\}14\.9SkillOS![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/fire.png)Qwen3\-8B95\.22\.9\\textbf\{95\.2\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.9\}\}71\.87\.771\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.7\}\}74\.113\.074\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 13\.0\}\}72\.910\.172\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.1\}\}77\.36\.1\\textbf\{77\.3\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.1\}\}77\.810\.0\\textbf\{77\.8\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.0\}\}80\.23\.1\\textbf\{80\.2\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.1\}\}14\.8### 4\.1Setup

We briefly discuss the experiment setup throughout this paper\. Full description of datasets, implementations, baselines, and evaluations can be found in Appendix[B](https://arxiv.org/html/2605.06614#A2)\.

Dataset\.For agentic tasks, we conduct experiments on ALFWorld\(shridhar2021alfworld\)and WebShop\(10\.5555/3600270\.3601778\)\. ALFWorld is a text\-based interactive environment aligned with the ALFRED embodied AI benchmark, where agents must complete household tasks through textual navigation and object manipulation\. WebShop simulates an online shopping environment in which agents navigate a realistic web interface to identify and purchase products that satisfy user\-specified requirements\. For each benchmark, we trainSkillOSon its training split whereZiZ\_\{i\}is the default task type annotations, and evaluate on the corresponding test set\. In addition to agentic tasks, we also benchmark for single\-turn reasoning tasks, including AIME24, AIME25, and GPQA\-Diamond\(rein2024gpqa\)\. Training data are constructed from DeepMath\-103k\(he2026deepmathk\), where we randomly sample a subset of 33,000 data points\.

Evaluation Configurations\.We evaluate all methods across two dimensions,effectivenessandefficiency\. For effectiveness, we measure the success rate \(SR\) and accuracy for agentic tasks and reasoning tasks, respectively\. For efficiency, we compute the number of execution steps per agentic task and the number of tokens per reasoning problem, respectively\. We compareSkillOSwith three categories of baselines: \(i\) a memory\-free agent \(No Memory\); \(ii\) existing memory\-based methods, including ReasoningBank\(ouyang2026reasoningbank\), which distills reusable insights from past experiences, and MemP\(DBLP:journals/corr/abs\-2508\-06433\), which induces procedural memory with advanced memory\-management strategies; and \(iii\) internal variants of our framework, includingSkillOS\-base, which uses the initial skill curator without RL training, andSkillOS\-gemini, which uses Gemini\-2\.5\-Pro to directly perform skill curation instead of learning the curator with RL\. All prompts used can be found in Appendix[A](https://arxiv.org/html/2605.06614#A1)\.

Implementation Details\.We use Qwen3\-8B\(DBLP:journals/corr/abs\-2505\-09388\)as the base model forπ𝒮\\pi\_\{\\mathcal\{S\}\}\. The frozen executor is also instantiated with Qwen3\-8B during training\. We train our model using GRPO with a learning rate1×10−61\\times 10^\{\-6\}, batch size3232, and group size88\. Training is conducted on 16 H100 GPUs using theverlframework\(sheng2024hybridflow\)\. The full training process takes approximately 3 days for ALFWorld, 2\.5 days for reasoning tasks, and 5 days for WebShop\. For testing, we additionally include Qwen3\-32B, Gemini\-2\.5\-Pro\(DBLP:journals/corr/abs\-2507\-06261\), and Gemini\-3\.1\-Flash\-Lite \(Appendix[C\.1](https://arxiv.org/html/2605.06614#A3.SS1)\) as executors to evaluate the generalization ofSkillOSunder different executor scales and architectures\. Task outcome signal𝟙ξt\\mathbbm\{1\}\_\{\\xi\_\{t\}\}is obtained via LLM\-as\-a\-judge with the frozen agent executor \(prompt shown in Appendix[A](https://arxiv.org/html/2605.06614#A1)\)\. We use ReAct\(DBLP:conf/iclr/YaoZYDSN023\)for agent execution and CoT\(DBLP:conf/nips/Wei0SBIXCLZ22\)for reasoning tasks\. For the reward function, we setλf=1\.0\\lambda\_\{f\}=1\.0,λu=0\.1\\lambda\_\{u\}=0\.1, andλc=0\.05\\lambda\_\{c\}=0\.05\. We report averaged performance and standard deviation over 3 runs\.

Table 2:Experiment results on WebShop and single\-turn reasoning tasks for 3 different frozen executors\. For WebShop, the averaged score, success rate \(SR↑\\uparrow\), and the number of steps \(Steps↓\\downarrow\) are reported\. For reasoning tasks, accuracy \(Acc\.↑\\uparrow\) is reported on three datasets\.MethodsCuratorWebShopReasoningπ𝒮\\pi\_\{\\mathcal\{S\}\}ScoreSRStepsAIME24AIME25GPQAAvg\. AccExecutorπℒ\\pi\_\{\\mathcal\{L\}\}: Qwen3\-8BNo MemoryNone33\.30\.733\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.7\}\}9\.80\.59\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.5\}\}20\.376\.06\.976\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.9\}\}71\.110\.771\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.7\}\}61\.81\.161\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.1\}\}69\.64\.769\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.7\}\}ReasoningBank![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B35\.41\.135\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.1\}\}11\.40\.911\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.9\}\}20\.575\.45\.075\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.0\}\}73\.210\.873\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.8\}\}60\.33\.960\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.9\}\}69\.62\.569\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.5\}\}MemP![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B35\.70\.935\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.9\}\}12\.00\.512\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.5\}\}21\.375\.65\.175\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}71\.15\.171\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}60\.64\.060\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.0\}\}69\.14\.069\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.0\}\}SkillOS\-base![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B38\.60\.938\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.9\}\}13\.60\.813\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.8\}\}20\.175\.65\.175\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}71\.96\.971\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.9\}\}59\.32\.559\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.5\}\}68\.92\.668\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.6\}\}SkillOS\-gemini![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-pro38\.11\.038\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.0\}\}13\.20\.913\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.9\}\}19\.673\.31\.373\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.3\}\}71\.31\.971\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.9\}\}57\.62\.857\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.8\}\}67\.40\.867\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.8\}\}SkillOS![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/fire.png)Qwen3\-8B40\.60\.7\\textbf\{40\.6\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.7\}\}16\.50\.7\\textbf\{16\.5\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.7\}\}19\.480\.03\.3\\textbf\{80\.0\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.3\}\}76\.75\.8\\textbf\{76\.7\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.8\}\}64\.61\.3\\textbf\{64\.6\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.3\}\}73\.81\.8\\textbf\{73\.8\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.8\}\}Executorπℒ\\pi\_\{\\mathcal\{L\}\}: Qwen3\-32BNo MemoryNone41\.50\.541\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.5\}\}12\.20\.312\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.3\}\}17\.081\.41\.381\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.3\}\}72\.23\.872\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.8\}\}68\.42\.068\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.0\}\}74\.01\.974\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.9\}\}ReasoningBank![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-32B40\.40\.840\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.8\}\}11\.21\.111\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.1\}\}17\.981\.19\.681\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 9\.6\}\}75\.65\.975\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.9\}\}66\.91\.266\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.2\}\}74\.92\.274\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.2\}\}MemP![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-32B30\.70\.730\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.7\}\}10\.10\.610\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.6\}\}17\.482\.25\.182\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}76\.70\.076\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}66\.52\.366\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.3\}\}75\.12\.175\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.1\}\}SkillOS\-base![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B43\.40\.843\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.8\}\}12\.31\.012\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.0\}\}16\.880\.03\.380\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.3\}\}75\.610\.275\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.2\}\}67\.71\.567\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.5\}\}74\.73\.374\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.3\}\}SkillOS\-gemini![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-pro45\.21\.045\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.0\}\}13\.21\.113\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.1\}\}16\.677\.86\.777\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.7\}\}74\.41\.974\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.9\}\}66\.20\.666\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.6\}\}73\.22\.673\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.6\}\}SkillOS![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/fire.png)Qwen3\-8B49\.21\.2\\textbf\{49\.2\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.2\}\}16\.50\.6\\textbf\{16\.5\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.6\}\}15\.985\.61\.9\\textbf\{85\.6\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.9\}\}81\.13\.3\\textbf\{81\.1\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.3\}\}72\.43\.0\\textbf\{72\.4\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.0\}\}79\.71\.6\\textbf\{79\.7\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.6\}\}Executorπℒ\\pi\_\{\\mathcal\{L\}\}: Gemini\-2\.5\-proNo MemoryNone48\.60\.348\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.3\}\}38\.40\.538\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.5\}\}19\.585\.61\.985\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.9\}\}80\.06\.780\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.7\}\}79\.91\.579\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.5\}\}81\.82\.881\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.8\}\}ReasoningBank![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-pro50\.81\.550\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.5\}\}40\.21\.340\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.3\}\}19\.285\.65\.185\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}84\.46\.784\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.7\}\}80\.42\.180\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.1\}\}83\.52\.183\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.1\}\}MemP![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-pro51\.31\.251\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.2\}\}39\.81\.039\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.0\}\}19\.483\.36\.983\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.9\}\}76\.75\.876\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.8\}\}81\.83\.481\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.4\}\}80\.63\.280\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.2\}\}SkillOS\-base![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Qwen3\-8B52\.81\.052\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.0\}\}39\.60\.839\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.8\}\}19\.087\.83\.387\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.3\}\}83\.31\.983\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.9\}\}82\.82\.782\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.7\}\}84\.61\.884\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.8\}\}SkillOS\-gemini![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/snowflake.png)Gemini\-2\.5\-pro54\.71\.054\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.0\}\}41\.01\.241\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.2\}\}17\.890\.05\.190\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.1\}\}85\.67\.785\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 7\.7\}\}80\.75\.580\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.5\}\}85\.43\.585\.4\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.5\}\}SkillOS![[Uncaptioned image]](https://arxiv.org/html/2605.06614v1/figures/fire.png)Qwen3\-8B56\.00\.7\\textbf\{56\.0\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.7\}\}41\.30\.8\\textbf\{41\.3\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.8\}\}18\.392\.22\.4\\textbf\{92\.2\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.4\}\}86\.73\.5\\textbf\{86\.7\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.5\}\}86\.82\.1\\textbf\{86\.8\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.1\}\}88\.61\.5\\textbf\{88\.6\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.5\}\}
### 4\.2Main Results

Tables[1](https://arxiv.org/html/2605.06614#S4.T1)and[2](https://arxiv.org/html/2605.06614#S4.T2)summarize the results for different benchmarks with Qwen3\-8B as the skill curator on various agent executors\. Based on the results, we have the following observations\.

SkillOSachieves strong performance gains across benchmarks\.Across all three benchmarks,SkillOSconsistently outperforms both memory\-free and memory\-based baselines, showing that the gains come from*learning to manage and evolve*skills rather than from maintaining a static collection\. On ALFWorld,SkillOSimproves the average success rate from 55\.7 to 61\.2 over the strongest baseline ReasoningBank with Qwen3\-8B as the executor; similar trends hold on WebShop and reasoning tasks\. Strikingly, our RL\-trained 8B curator even surpassesSkillOS\-gemini, despite the latter using a far stronger frontier model as the curator, demonstrating that targeted training of a small curator can outweigh raw model scale\. The benefits brought by RL training are also compounded with executor capacity, yielding\+9\.5\+9\.5absolute improvement with Gemini\-2\.5\-Pro versus\+7\.9\+7\.9with Qwen3\-8B for ALFworld, compared withSkillOS\-base\.

SkillOSis more efficient, requiring fewer interactions and lower execution cost\.The gains ofSkillOSare accompanied by better efficiency rather than longer trajectories\. On ALFWorld,SkillOSreduces the average interaction steps by2\.22\.2,3\.03\.0, and3\.13\.1compared with “no memory” setting with 3 executors, consistently outperforming all memory\-based baselines\. This trend extends to WebShop, whereSkillOSsecures higher success rates with fewer environment interactions\. These results indicate that the learned skill manager enables the executor to identify procedural shortcuts and bypass redundant exploration\. Rather than relying on additional trial\-and\-error,SkillOSimproves performance by distilling experience into direct, actionable expertise that simplifies task execution\.

The gains differ between agentic and reasoning tasks, reflecting different forms of reusable skills\.A notable trend is that the gains ofSkillOSare generally larger on multi\-turn agentic benchmarks than on single\-turn reasoning tasks\. We hypothesize that this difference arises from how reusable skills manifest across task types\. Agentic tasks naturally expose procedural regularities, such as action ordering, exploration strategies, recovery behaviors, and environment\-specific constraints, which can be repeatedly composed and refined across task streams\. Reasoning tasks also benefit from skill curation, but their reusable knowledge often appears at a more abstract level, such as decomposition heuristics, constraint formulation, or verification patterns, rather than as directly reusable action procedures\. As a result,SkillOSstill improves reasoning performance, while the gains are typically smaller than those observed on agentic benchmarks\. We provide a case study demonstrating skills curated for different tasks in Figure[17](https://arxiv.org/html/2605.06614#A3.F17)\.

### 4\.3Generalization ofSkillOS

![Refer to caption](https://arxiv.org/html/2605.06614v1/x3.png)Figure 3:Cross\-task generalization results ofSkillOSwith \(a\) Qwen3\-8B, \(b\) Qwen3\-32B, and \(c\) Gemini\-2\.5\-Pro as frozen executors\. We plot relative improvement with baselines fromleasttomost\.SkillOSis transferable and remains effective for different agent executors\.During training, we use Qwen3\-8B as the executor\. To test whetherSkillOSbrings improvement for executors that are not seen in training, we pair the trained skill curator with different executors\. As shown in Table[1](https://arxiv.org/html/2605.06614#S4.T1)and[2](https://arxiv.org/html/2605.06614#S4.T2),SkillOSconsistently improves a wide range of frozen executors across benchmarks, from open\-source models \(Qwen3\-8B, Qwen3\-32B\) to frontier models \(Gemini\-2\.5\-Pro\)\. On ALFWorld, it lifts the average success rate of Qwen3\-8B from 47\.9 to 61\.2 and Gemini\-2\.5\-Pro from 66\.4 to 80\.2, demonstrating compatibility with executors of varying capacity\. Notably, using Gemini\-2\.5\-Pro directly as the curator \(SkillOS\-gemini\) underperforms our trained curator, especially when paired with the smaller Qwen3\-8B executor\. This highlights a curator\-executor mismatch: stronger reasoning ability alone does not guarantee effective skill curation, as frontier\-generated skills may be misaligned with the executor’s capacity or usage patterns\. By contrast,SkillOSlearns executor\-grounded curation behaviors through RL, producing skills that better match the downstream agent\.

SkillOSdelivers consistent performance improvement when generalized to different task domains\.Figure[3](https://arxiv.org/html/2605.06614#S4.F3)shows that the skill curator learned bySkillOStransfers well across different tasks\. While training and testing on the same task often gives the strongest gain, most off\-diagonal entries still bring performance improvement over baselines, indicating thatSkillOScaptures reusable skills beyond task\-specific heuristics\. Specifically, skill curatorπs\\pi\_\{s\}learned from reasoning tasks transfer particularly well to the two agentic tasks, likely because they contain more abstract and high\-level strategies, such as decomposition, verification, and adaptive planning, which are broadly useful across settings\. In contrast, skills learned from WebShop or ALFWorld are more tied to environment\-specific knowledge, making them less transferable across tasks\.

## 5Analysis

Beyond performance, we analyze*why*SkillOSworks, focusing on design choices, evolution of curator’s behaviors and contents inSkillRepo, and the role of curated skills in task success\. Additional analyses are included in Appendix[C](https://arxiv.org/html/2605.06614#A3)\.

Table 3:Ablation results of reward design on the ALFWorld dataset\.MethodsAvg\. SRStepsSkillOS\-GRPO61\.218\.9w/orc​n​tr^\{cnt\}58\.620\.1w/orc​o​m​pr^\{comp\}60\.019\.3w/o grouping57\.320\.6##### Ablation Studies\.

We ablate two components ofSkillOS: \(i\) auxiliary rewards in Eq\.[1](https://arxiv.org/html/2605.06614#S3.E1), and \(ii\) grouped task streams in §[3\.2\.1](https://arxiv.org/html/2605.06614#S3.SS2.SSS1)\. Experiments are conducted on ALFWorld, with Qwen3\-8B used as both the curator and executor\. As shown in Table[3](https://arxiv.org/html/2605.06614#S5.T3), removing either reward component hurts performance\. Without the content\-quality reward, the success rate drops from 61\.2 to 58\.6, showing the importance of intermediate supervision for guiding skill updates in a pipelined system\. Removing the compression reward causes a smaller but consistent drop, suggesting that concise repositories are easier for the executor to use\. The most significant degradation comes from using random task sequences \(w/o grouping\), which lowers the success rate to 57\.3\. This highlights the importance of training on grouped task streams, in which curation decisions are learned from their downstream impact on related future tasks\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x4.png)Figure 4:Behaviors of the skill curator w\.r\.t\. skill operations during training\.Behaviors of Skill Curator\.To better understand how the behavior of the skill curator evolves during training, we analyze the distribution of its three skill operations from rollouts at different training steps:insert\_skill,update\_skill, anddelete\_skill\. Figure[4](https://arxiv.org/html/2605.06614#S5.F4)plots the proportion of each operation\. At the beginning of training,insertoverwhelmingly dominates, indicating that the model is primarily focused on populating the skill repository with new knowledge distilled from experience\. As training progresses, however,updatebecomes increasingly frequent, whileinsertsteadily declines\. This suggests that the skill curator gradually moves from plain expansion of skills to refining existing skills\. Meanwhile,deleteremains a relatively small fraction throughout training with a slightly growing trend, showing the effectiveness of rewarding conciseness ofSkillRepo\. Instead, the dominant form of adaptation is to revise and consolidate previously acquired skills\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x5.png)Figure 5:Evolution dynamics of the curated skills under RL training\.Skill Evolution Dynamics\.Beyond aggregate performance, we examine how the skill repository evolves during RL training\. We focus on two emergent phenomena: \(i\) new Markdown sections within individual skills, and \(ii\) higher\-level meta\-skills that capture reusable principles across tasks\. Figure[5](https://arxiv.org/html/2605.06614#S5.F5)\(a\) shows that early in training, the curator tends to introduce generic sections such as additional guidance, tips, or recommendations, which often make skills more verbose without substantially improving their operational value\. As training progresses, these additions shift toward more actionable structures, such as failure\-handling logic and conditional branches that specify when to deviate from the default workflow\. This suggests that RL gradually steers the curator from superficial enrichment toward execution\-oriented skill refinement\. Figure[5](https://arxiv.org/html/2605.06614#S5.F5)\(b\) further shows that evolution occurs not only within individual skills, but also in the global organization of the repository\. Early repositories are dominated by narrow, task\-specific skills, whereas later repositories contain a more diverse set of meta\-strategy skills covering verification, fallback planning, system search, and strategy adjustment\. This indicates that the learned curator does not merely accumulate skills, but progressively expands the repository’s strategic space, shifting it from isolated task\-local procedures toward more compositional cross\-task control knowledge\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x6.png)Figure 6:Comparison of skill utilization statistics on ALFWorld\.Attribution of Skill Usage\.To better understand whether the gains ofSkillOScome from the evolved skills, we analyze how skills are used during evaluation\. We consider 4 complementary metrics: \(i\)*skill usage rate*, the fraction of examples where the agent invokes at least one skill; \(ii\)*successful skill usage rate*, the success rate among examples that use skills; \(iii\)*skill coverage*, the fraction of the skill collection that are actually used; and \(iv\) the*average number of skills used per example*, which measures the degree of skill reliance\. Figure[6](https://arxiv.org/html/2605.06614#S5.F6)reports results on ALFWorld\. Compared with the baseline,SkillOSinvokes skills on*all*evaluation examples and achieves a higher success rate, indicating that the evolved skills contribute directly to task solving\. Also, a larger fraction of the skill curated bySkillOSis used, showing that RL training improves the overall utility of the curatedSkillRepo\. Meanwhile,SkillOSuses fewer skills per example, suggesting that gains come from more precise skill selection rather than more skill context\.

## 6Conclusion

We presentedSkillOS, an RL training recipe for learning skill curation in self\-evolving agents\. By decoupling the*skill curator*from the*agent executor*,SkillOSenables modular skill curation without retraining the underlying executor\. Through grouped task streams and executor\-grounded rewards,SkillOSoptimizes curation decisions by their downstream impact on future tasks\. Across diverse benchmarks and LLM backbones,SkillOSconsistently improves both performance and efficiency\. Further analyses show that trained skill curation can outperform frontier models’ zero\-shot curation ability and generalize across settings, highlighting modular, trained skill curation as a practical path toward agents that self\-evolve from experience\.

## 7Acknowledgments

We thank Zilin Xiao, I\-Hung Hsu, Zexue He, and members from Google Cloud AI Research for their valuable feedback during the preparation of the paper\. Siru was supported by the Molecule Maker Lab Institute: An AI Research Institutes program supported by NSF under Award No\. 2019897\.

## References

Contents of Appendix

## Appendix APrompts

In this section, we provide the full prompt templates used throughout different phases and components of our framework\.

### A\.1Prompt for Skill Curator

The following prompt templates demonstrate the input to the skill curator during training processes\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x7.png)Figure 7:System prompt used for skill curator during training process\.![Refer to caption](https://arxiv.org/html/2605.06614v1/x8.png)Figure 8:Tool call definition/signature of skill curator in Figure[7](https://arxiv.org/html/2605.06614#A1.F7)\.
### A\.2Prompt for Agent Executor

The following prompts are used for the frozen agent executor\. These templates provide the agent with the current task description, a history of previous interactions, and a set of retrieved skills to guide its decision\-making process\. All prompts explicitly force chain\-of\-thought \(CoT\)\[wei2022chain\]reasoning\.

For agent tasks including ALFWorld and WebShop, we follow GiGPO\[feng2025group\]and leverage its environment and prompt setting for inference\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x9.png)Figure 9:Prompt for ALFWorld agent execution with relevant retrieved skills\.![Refer to caption](https://arxiv.org/html/2605.06614v1/x10.png)Figure 10:Prompt for WebShop agent execution with relevant retrieved skills\.![Refer to caption](https://arxiv.org/html/2605.06614v1/x11.png)Figure 11:Prompt for agent execution in reasoning tasks with relevant retrieved skills\.
### A\.3Prompt Used During Training

During the RL training process, a rewardrc​n​tr^\{cnt\}is assigned based on an external judge of Qwen3\-32B to judge whether the curated skills are semantically meaningful and are likely to be useful for future tasks\. We show the prompt to the external judge here\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x12.png)Figure 12:Prompt for using an external judge to assign a reward scorerc​n​tr^\{cnt\}for generated skill contents\.
### A\.4Prompt for LLM\-as\-a\-Judge to Obtain Correctness Signals

We present the prompts used to obtain the self\-judged correctness signal𝟙ξt\\mathbbm\{1\}\_\{\\xi\_\{t\}\}for self\-evolution via LLM\-as\-a\-judge using the corresponding frozen agent executor as the backbone model in Figures[13](https://arxiv.org/html/2605.06614#A1.F13),[14](https://arxiv.org/html/2605.06614#A1.F14)for ALFWorld, reasoning, and WebShop tasks, respectively\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x13.png)Figure 13:Prompt for LLM\-as\-a\-judge to obtain the correctness signal to the current trajectory in the ALFWorld benchmark\.![Refer to caption](https://arxiv.org/html/2605.06614v1/x14.png)Figure 14:Prompt for LLM\-as\-a\-judge to obtain the correctness signal for single\-turn reasoning problems\.![Refer to caption](https://arxiv.org/html/2605.06614v1/x15.png)Figure 15:Prompt for LLM\-as\-a\-judge to obtain the correctness signal to the current trajectory for the WebShop benchmark\.

## Appendix BImplementation Details

### B\.1Hyperparameters

We present the choices for all hyperparameters during both the training and inference processes in Table[4](https://arxiv.org/html/2605.06614#A2.T4)for different tasks\.

Table 4:Hyperparameters forSkillOSfor training and inference settings\.HyperparameterValueALFWorldWebShopReasoningRL TrainingLearning rate1×10−61\\times 10^\{\-6\}Batch size32KL loss Coef0\.001Max Prompt Length16,384Max Response Length4,096GRPO group size8Temperature1\.0Steps6050100Data Grouping Size1010Random\(5,12\)Agent Executor InferenceTop\-K skill retrieval5Max number of turns30301Action history length33\-
### B\.2Grouping Training Instances

In this section, we detail the two\-stage pipeline used to turn the raw training set𝒟=\{xi\}i=1N\\mathcal\{D\}=\\\{x\_\{i\}\\\}\_\{i=1\}^\{N\}into the grouped training set𝒢=\{Gj\}j=1M\\mathcal\{G\}=\\\{G\_\{j\}\\\}\_\{j=1\}^\{M\}of Section[3\.2\.1](https://arxiv.org/html/2605.06614#S3.SS2.SSS1)\. Stage 1 annotates each instance with a structured set of latent attributes via an LLM annotator \(Sec\.[B\.2\.1](https://arxiv.org/html/2605.06614#A2.SS2.SSS1)\)\. Stage 2 assembles groups of related tasks by retrieving, filtering, and ranking candidates under a semantic phrase\-level similarity \(Sec\.[B\.2\.2](https://arxiv.org/html/2605.06614#A2.SS2.SSS2)\)\. For training of single\-turn reasoning tasks, we instantiate the pipeline onDeepMath\-103K\[he2026deepmathk\], which provides both the raw problemsxix\_\{i\}and a scalar difficulty scoredi∈ℝd\_\{i\}\\in\\mathbb\{R\}that is reused as a curriculum signal by Stage 2\. For multi\-turn agentic tasks, we leverage the default task type annotation for each benchmark \(e\.g\., 6 task types in ALFWorld\) as they naturally expose a discrete partition of tasks into families that share the same underlying skills, and we can use this partition directly in place of the annotated attribute setZiZ\_\{i\}\.

#### B\.2\.1Stage 1: Latent Attribute Annotation

We implement the attribute setZiZ\_\{i\}of each instancexix\_\{i\}as a tuple of five phrase\-lists,

Zi=\(Ti,Si,Ci,Ri,Pi\),Z\_\{i\}\\;=\\;\\bigl\(T\_\{i\},\\;S\_\{i\},\\;C\_\{i\},\\;R\_\{i\},\\;P\_\{i\}\\bigr\),whereTiT\_\{i\}is the list of high\-level*topics*,SiS\_\{i\}the required*skills or capabilities*,CiC\_\{i\}the underlying*mathematical concepts or theorems*,RiR\_\{i\}the applicable*heuristic strategies*, andPiP\_\{i\}the*common pitfalls*\. Each dimension is populated by a small set of short phrases \(at most five words each\)\. The annotator is instructed to: \(i\) emit standardized terminology rather than free\-form rationales, \(ii\) omit any content specific to the question text or its final answer, and \(iii\) use as few phrases per dimension as necessary to characterize the task\. We enforce the output schema via structured decoding with a fixed JSON response schema, and query Gemini\-2\.5\-Pro with the highest thinking\-budget configuration\. The exact annotation instruction is reproduced in Figure[16](https://arxiv.org/html/2605.06614#A2.F16)\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x16.png)Figure 16:System instruction used to elicitZiZ\_\{i\}from each task in𝒟\\mathcal\{D\}\.
#### B\.2\.2Stage 2: Group Construction

Given\{\(xi,Zi,di\)\}i=1N\\\{\(x\_\{i\},Z\_\{i\},d\_\{i\}\)\\\}\_\{i=1\}^\{N\}, we construct each groupGj=\(xj,1,…,xj,n\)G\_\{j\}=\(x\_\{j,1\},\\dots,x\_\{j,n\}\)by sampling a seed task and then iteratively appending related tasks\. The core primitive is a pair sampler that, given a sourcexsx\_\{s\}, returns an admissible successorxtx\_\{t\}; longer groups are obtained by iterating this primitive with a growing exclusion set so that instances within a group remain distinct\.

##### Phrase similarity\.

Because the annotated phrases come from a large open vocabulary \(e\.g\.,*“pigeonhole principle”*vs\.*“counting argument”*\), exact set overlap is unreliable\. We therefore score the similarity between any two phrase listsAAandBBusing a*soft\-Jaccard*SJτ​\(A,B\)\\mathrm\{SJ\}\_\{\\tau\}\(A,B\)that combines exact matches with a greedy one\-to\-one matching between remaining phrases under a sentence\-embedding cosine similarity \(computed withall\-MiniLM\-L6\-v2\[reimers2019sentence\]\) above a thresholdτ\\tau\. We writemτ​\(A,B\)m\_\{\\tau\}\(A,B\)for the resulting integer*matched\-pair count*, which we use alongsideSJτ\\mathrm\{SJ\}\_\{\\tau\}in the filters below\.

##### Dependency gate\.

For a sourcexsx\_\{s\}and candidatextx\_\{t\}, we accept the pair only when all of the following hold:

1. 1\.*Shared foundation:*mτ​\(Cs,Ct\)≥κCm\_\{\\tau\}\(C\_\{s\},C\_\{t\}\)\\geq\\kappa\_\{C\}andmτ​\(Ss,St\)≥κSm\_\{\\tau\}\(S\_\{s\},S\_\{t\}\)\\geq\\kappa\_\{S\};
2. 2\.*Shared reasoning:*mτ​\(Rs,Rt\)\+mτ​\(Ps,Pt\)≥1m\_\{\\tau\}\(R\_\{s\},R\_\{t\}\)\+m\_\{\\tau\}\(P\_\{s\},P\_\{t\}\)\\geq 1;
3. 3\.*Not a near\-duplicate:*SJτ​\(Ts,Tt\)≤θT\\mathrm\{SJ\}\_\{\\tau\}\(T\_\{s\},T\_\{t\}\)\\leq\\theta\_\{T\}and the weighted overall similarityΩ​\(xs,xt\)≤σmax\\Omega\(x\_\{s\},x\_\{t\}\)\\leq\\sigma\_\{\\max\};
4. 4\.*Not too unrelated:*Ω​\(xs,xt\)≥σmin\\Omega\(x\_\{s\},x\_\{t\}\)\\geq\\sigma\_\{\\min\};
5. 5\.*Progression:*xtx\_\{t\}introduces at least one new concept or skill, i\.e\.\|Ct\|\>mτ​\(Cs,Ct\)\|C\_\{t\}\|\>m\_\{\\tau\}\(C\_\{s\},C\_\{t\}\)or\|St\|\>mτ​\(Ss,St\)\|S\_\{t\}\|\>m\_\{\\tau\}\(S\_\{s\},S\_\{t\}\);
6. 6\.*Curriculum direction:*dt−ds≥δmind\_\{t\}\-d\_\{s\}\\geq\\delta\_\{\\min\}\.

HereΩ\\Omegais a convex combination of per\-dimension soft\-Jaccard scores across\{C,S,R,P,T\}\\\{C,S,R,P,T\\\}with weights listed in Table[5](https://arxiv.org/html/2605.06614#A2.T5)\. Conditions \(1\)–\(2\) ensure genuine reuse of foundational knowledge and reasoning machinery; \(3\)–\(4\) place the pair in a useful “related but not redundant” band; \(5\) guarantees thatxtx\_\{t\}carries something new for the skill curator to compress into the library; and \(6\) enforces a forward curriculum\.

##### Candidate retrieval and scoring\.

Scoring allN−1N\{\-\}1alternatives per source is prohibitive, so we precompute an inverted index over the dependency fields\{C,R,P\}\\\{C,R,P\\\}: for each sourcexsx\_\{s\}, the candidate pool consists of tasks that share at least one exact dependency phrase withxsx\_\{s\}, capped atKinvK\_\{\\text\{inv\}\}entries via uniform subsampling\. Routing retrieval through dependency fields rather than topics prevents groups from collapsing onto a single narrow subject\. Among the candidates that pass the gate, we select the one that maximizes

s​\(xs,xt\)=∑f∈\{C,S,R,P,T\}wf​SJτ​\(fs,ft\)\+λ⋅b​\(ds,dt\),s\(x\_\{s\},x\_\{t\}\)\\;=\\;\\sum\_\{f\\in\\\{C,S,R,P,T\\\}\}w\_\{f\}\\,\\mathrm\{SJ\}\_\{\\tau\}\(f\_\{s\},f\_\{t\}\)\\;\+\\;\\lambda\\cdot b\(d\_\{s\},d\_\{t\}\),whereb​\(⋅\)b\(\\cdot\)is a bounded difficulty bonus that rewards moderate forward steps\. If no inverted\-index candidate passes the gate, we fall back to a uniform random pool of sizeFFand re\-apply the same gate and scoring; this catches pairs whose phrases agree semantically but not lexically\. Extensions sourced from the fallback pool are tagged so downstream training can audit or downweight them\. The difficulty gapdt−dsd\_\{t\}\-d\_\{s\}is additionally modulated by a randomized curriculum mode\(p↑,p=,p↓\)\(p\_\{\\uparrow\},p\_\{=\},p\_\{\\downarrow\}\); for our main experiments, we use an almost exclusively forward curriculum, which produced a more stable training signal than mixed curricula\.

##### Hyperparameters\.

Table[5](https://arxiv.org/html/2605.06614#A2.T5)lists all hyperparameters of the Stage 2 pipeline and the values adopted for our main experiments\. The weights were tuned on a held\-out subset of 200 source tasks by manually inspecting sampled pairs for prerequisite quality; we found the pipeline largely insensitive to small perturbations of the weights but noticeably sensitive to the progression and overall\-similarity\-band conditions, removing either of which produced markedly more trivial or degenerate pairs\.

Table 5:Hyperparameters of the Stage 2 grouping pipeline\.SymbolMeaningValue—Phrase encoderall\-MiniLM\-L6\-v2τ\\tauCosine threshold for fuzzy phrase matching0\.600\.60κC\\kappa\_\{C\}Minimum matched concept pairs11κS\\kappa\_\{S\}Minimum matched skill pairs11θT\\theta\_\{T\}Maximum topic soft\-Jaccard0\.650\.65σmin,σmax\\sigma\_\{\\min\},\\sigma\_\{\\max\}Overall\-similarity band0\.30,0\.850\.30,\\,0\.85δmin\\delta\_\{\\min\}Difficulty\-delta floor0\.00\.0\(wC,wS,wR,wP,wT\)\(w\_\{C\},w\_\{S\},w\_\{R\},w\_\{P\},w\_\{T\}\)Dimension weights\(5,4,3,1,2\)\(5,\\,4,\\,3,\\,1,\\,2\)λ\\lambdaDifficulty\-bonus weight1\.01\.0\(p↑,p=,p↓\)\(p\_\{\\uparrow\},p\_\{=\},p\_\{\\downarrow\}\)Mode probabilities\(0\.80,0\.20,0\.00\)\(0\.80,\\,0\.20,\\,0\.00\)\[Δmin,Δmax\]\[\\Delta\_\{\\min\},\\Delta\_\{\\max\}\]Gap ineasy→\\rightarrowhardmode\[0\.5,3\.0\]\[0\.5,\\,3\.0\]Δ=\\Delta\_\{=\}Maximum\|dt−ds\|\|d\_\{t\}\-d\_\{s\}\|insamemode0\.30\.3KinvK\_\{\\text\{inv\}\}Inverted\-index subsample cap2,0002\{,\}000FFFallback pool size200200

### B\.3Experiment Setup

#### B\.3\.1Datasets

In this section, we provide a detailed introduction to all the datasets involved in this paper\.

ALFWorld\.ALFWorld\[shridhar2021alfworld\]is a text\-based interactive benchmark that aligns the TextWorld engine with the embodied ALFRED environment, enabling agents to learn high\-level household policies through natural\-language interaction\. The benchmark covers six task types — Pick & Place, Examine in Light, Clean & Place, Heat & Place, Cool & Place, and Pick Two & Place — situated in 120 simulated rooms spanning kitchens, bedrooms, bathrooms, and living rooms\. It provides3,5533,553training tasks, together with140140valid\_seen tasks for the test set\. At each step, the agent receives a textual description of its surroundings together with a goal instruction \(e\.g\., "put a hot apple in the fridge"\) and must issue high\-level commands such as go to, take, open, heat, and put\.

WebShopWebShop\[10\.5555/3600270\.3601778\]is a simulated e\-commerce web environment designed to benchmark language agents on realistic, grounded shopping tasks\. The environment is populated with 1\.18 million real\-world products scraped from Amazon and 12,087 crowd\-sourced natural\-language instructions, partitioned into 10,587 training, 1,000 dev, and 500 test instructions\. Given an instruction \(e\.g\., “I’m looking for a quick\-release fitness strap band in teal, priced lower than $40\.00”\), the agent interacts with the environment via two action types — search\[query\] and click\[button\] — to locate and purchase a product that matches the specified attributes, type, options, and price\. At the end of each episode, a programmatic reward in \[0, 1\] is computed by comparing the purchased item against the ground\-truth product specification\. Following the standard evaluation protocol used in prior LLM\-agent work, we evaluate on the 500 held\-out test instructions\.

DeepMath\-103KDeepMath\-103K\[he2026deepmathk\]is a large\-scale, decontaminated mathematical reasoning dataset containing approximately 103K problems at high difficulty \(primarily AoPS Levels 5–9\), spanning algebra, calculus, number theory, geometry, probability, and discrete mathematics\. Each problem is paired with a verifiable final answer — enabling rule\-based RL rewards — together with a difficulty score, topic label, and three DeepSeek\-R1\[guo2025deepseek\]chain\-of\-thought solutions\. Specifically, we annotate a subset with around33,00033,000problems, with a final20,00020,000set of grouped training instances\.

AIME24 & AIME25\.A collection of demanding mathematical problems sourced from the 2024 and 2025 American Invitational Mathematics Examination \(AIME\), with 30 problems each year\. Problems encompass algebra, geometry, number theory, and combinatorics\. Created to assess large language models’ sophisticated mathematical reasoning abilities, the dataset presents substantial difficulty, systematic multi\-phase solutions, and distinctive answers, establishing it as a robust benchmark for evaluating advanced analytical capabilities\.

GPQA\.Short for Graduate Level Google\-Proof Q&\\&A Benchmark\[rein2024gpqa\], GPQA comprises a collection of demanding text\-based multiple choice problems authored by subject specialists in biology, physics, and chemistry, intentionally crafted to be “exceptionally challenging”\. We use the “GPQA\-Diamond” subset for testing, which has198198problems in total\.

#### B\.3\.2Baselines

We compareSkillOSagainst five representative baselines that span memory\-free agents, recent memory\-augmented methods, and two internal variants of our own framework\. All baselines share the same frozen Agent Executor and are evaluated under identical task suites, retrieval budgets, and decoding settings to isolate the contribution of the memory mechanism\.

\(i\) No Memory\.A memory\-free baseline in which the Agent Executor solves each task independently, without access to any external memory or cross\-task knowledge transfer\. Each episode begins from a blank state, and no information is retained across tasks\. This baseline establishes a lower bound and isolates the contribution of any form of accumulated experience\.

\(ii\) ReasoningBank\[ouyang2026reasoningbank\]\.A recent memory\-augmented method that distills reusable reasoning insights from past trajectories and stores them as a searchable bank for future tasks\. At inference time, relevant insights are retrieved and injected into the executor’s context to guide reasoning\. ReasoningBank represents the class of experience\-distillation approaches, which emphasize the content of stored knowledge but rely on fixed, heuristic policies for deciding what to write or discard\.

\(iii\) MemP\[DBLP:journals/corr/abs\-2508\-06433\]\.A procedural\-memory method that induces reusable procedures from agent experience and applies advanced memory\-management strategies — including consolidation, forgetting, and re\-indexing — to maintain the memory store over time\. MemP represents the class of rule\-based memory management approaches, which feature more sophisticated maintenance policies than ReasoningBank but still prescribe curation decisions through hand\-designed heuristics rather than learning them from downstream task feedback\.

\(iv\)SkillOS\-base\.A variant of our framework in which the Skill Curator is instantiated with the same open\-source backbone asSkillOSbut without any RL fine\-tuning, while all other components remain identical toSkillOS\. This baseline serves two purposes: \(a\) it provides a lower\-bound reference point that reflects the intrinsic prompting\-based curation ability of the open\-source backbone prior to optimization, and \(b\) it isolates the contribution of our GRPO\-based training, sinceSkillOS\-base shares exactly the same model architecture, prompting template, and memory interface asSkillOSbut forgoes end\-to\-end optimization against task performance\.

\(v\)SkillOS\-gemini\.A variant of our framework in which the Skill Curator is instantiated with Gemini\-2\.5\-Pro instead of a trained open\-source model, while all other components remain identical toSkillOS\. This baseline serves two purposes: \(a\) it provides a strong closed\-source reference point for the upper bound of prompting\-based curation, and \(b\) it isolates the effect of our GRPO\-based training, sinceSkillOS\-gemini shares the same prompting template and memory interface asSkillOSbut forgoes RL optimization against task performance\.

Together, these baselines cover the main design axes along which memory\-augmented agents differ fromSkillOS: whether memory exists at all \(i\), how stored knowledge is represented \(ii vs\. iii\), and whether curation decisions are prescribed by heuristics or learned from task feedback \(ii and iii vs\.SkillOS\), as well as whether the curator itself benefits from RL optimization \(iv and v vs\.SkillOS\)\.

#### B\.3\.3Evaluation Metrics

We evaluateSkillOSand all baselines along two complementary axes —task effectivenessandaction efficiency— using metrics tailored to each benchmark\. Across all benchmarks and methods, every configuration is run with three independent random seeds; we report the mean across seeds, with one standard deviation shown as a subscript \(e\.g\.,85\.7±1\.685\.7\_\{\\pm 1\.6\}\)\. Within each backbone block of Tables[1](https://arxiv.org/html/2605.06614#S4.T1)and[2](https://arxiv.org/html/2605.06614#S4.T2), the best value in each column is highlighted inbold\.

##### Success Rate \(SR↑\\uparrow\)\.

Our primary effectiveness metric on both ALFWorld and WebShop\. On ALFWorld, SR is the fraction of evaluation episodes in which the agent reaches the goal state within the step budget, yielding a binary\{0,1\}\\\{0,1\\\}outcome per episode\. We report SR both per task category —Pick,Look,Clean,Heat,Cool, andPick2— and as a macro\-average \(Avg\. SR\) across the six categories, so that categories with fewer tasks are not dominated by larger ones\. On WebShop, following\[10\.5555/3600270\.3601778\], SR is the fraction of episodes whose final reward equals exactly11, i\.e\., the purchased product fully matches all specified attributes, options, type, and price constraints\.

##### WebShop Score \(↑\\uparrow\)\.

In addition to SR, WebShop provides a dense per\-episode reward in\[0,100\]\[0,100\]that credits partial matches on attributes, options, type, and price even when the purchase is not a perfect match\. We report the average score across evaluation episodes as a finer\-grained complement to SR: two methods with similar SR may differ substantially in how close their near\-misses are to the target product\.

##### Number of Steps \(Steps↓\\downarrow\)\.

Our efficiency metric on ALFWorld and WebShop\.Stepsis the average number of environment actions the agent issues per episode, computed over all evaluation episodes regardless of success\. Failed episodes contribute steps up to their termination point \(task completion, max\-step cutoff, or early stop\)\. This metric captures a dimension that SR and Score alone cannot: two methods may achieve comparable effectiveness while differing substantially in how efficiently they reach the goal, which has direct implications for inference cost and deployment feasibility\.

##### Accuracy \(Acc\.↑\\uparrow\) on reasoning benchmarks\.

For the single\-turn reasoning datasets — AIME24, AIME25, and GPQA — we report exact\-match accuracy: the fraction of questions whose extracted final answer matches the ground truth\. For AIME24 and AIME25, we adopt the evaluation protocol from the HuggingFacemath\_verify111[https://github\.com/huggingface/Math\-Verify](https://github.com/huggingface/Math-Verify)toolkit, which parses the model’s final boxed expression and verifies mathematical equivalence to the reference answer \(accounting for equivalent numerical forms, simplifications, and formatting variants\)\. For GPQA, which is a multiple\-choice benchmark, we extract the predicted option letter from the model’s response and score it as correct if and only if it exactly matches the ground\-truth option\. We additionally report an average accuracy \(Avg\. Acc\.\) across the three datasets to summarize overall reasoning ability\.

##### Evaluation protocol\.

All methods share the same frozen Agent Executor, retrieval budget \(top\-kkskills retrieved via BM25\), maximum step budget, and decoding temperature within each backbone, so that differences in the reported metrics are attributable to the memory mechanism rather than to confounding inference settings\. Unless stated otherwise, all numbers in the main paper are computed on the official held\-out evaluation splits of each benchmark\.

## Appendix CAdditional Analyses

### C\.1Results on Gemini\-3\.1\-Flash\-Lite

In addition to the Qwen3\-8B/32B and Gemini\-2\.5\-Pro executors used in the main paper, we further evaluateSkillOSon ALFWorld with the more recent Gemini\-3\.1\-Flash\-Lite as the frozen Agent Executor, to verify that our gains generalize to newer model families\. Results are reported in Table[6](https://arxiv.org/html/2605.06614#A3.T6)\.

SkillOSachieves the highest average success rate \(73\.1%\), outperforming the strongest external baseline ReasoningBank \(66\.0%\) by\+7\.1 pointsand the No\-Memory baseline \(61\.2%\) by\+11\.9 points, while requiring the fewest interaction steps \(15\.5 vs\. 18\.5 for No Memory\)\. The two internal variants reproduce the ordering observed in the main experiments:SkillOS\-base reaches only 63\.6% — barely above No Memory — confirming that the open\-source backbone cannot recover the curation policy through prompting alone, andSkillOS\-gemini improves to 71\.2% but is still surpassed bySkillOSdespite using a much stronger curator backbone\. This reinforces our main finding that*learning*the curator with task\-level feedback contributes more than scaling up the curator model\. We also note that MemP \(58\.6%\) underperforms even No Memory under this executor, suggesting that hand\-designed curation heuristics are brittle when the executor is less capable, whereas the policy learned bySkillOSremains robust\. Per\-subset,SkillOSwins on four of six subsets, with particularly large margins onLook\(84\.6% vs\. 71\.8%\) andCool\(68\.0% vs\. 48\.0%\); the remaining two subsets are won bySkillOS\-gemini \(Pick\) and ReasoningBank \(Heat\), on whichSkillOSnonetheless remains competitive\. Overall, these results confirm that the advantage ofSkillOStransfers cleanly to a newer executor family\.

Table 6:Experiment results on ALFWorld benchmark\. Success rate \(SR↑\\uparrow\) and the number of steps \(Steps↓\\downarrow\) are reported on 6 subsets for Gemini\-3\.1\-Flash\-Lite as frozen executor\.MethodsPickLookCleanHeatCoolPick2Avg\. SRSteps\(35\)\(13\)\(27\)\(16\)\(25\)\(24\)\(140\)No Memory85\.70\.085\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}59\.08\.959\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.9\}\}67\.99\.367\.9\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 9\.3\}\}25\.06\.225\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.2\}\}38\.72\.338\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.3\}\}66\.70\.066\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}61\.22\.361\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.3\}\}18\.5ReasoningBank87\.64\.487\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.4\}\}71\.84\.471\.8\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.4\}\}63\.00\.063\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}52\.114\.4\\textbf\{52\.1\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 14\.4\}\}48\.010\.648\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 10\.6\}\}62\.50\.062\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}66\.02\.766\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.7\}\}17\.6MemP84\.36\.184\.3\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.1\}\}57\.75\.457\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 5\.4\}\}63\.00\.063\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}28\.14\.428\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.4\}\}34\.02\.834\.0\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.8\}\}62\.50\.062\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}58\.61\.058\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.0\}\}19\.3SkillOS\-base86\.71\.686\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.6\}\}61\.50\.061\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}66\.70\.066\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}41\.76\.241\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 6\.2\}\}38\.716\.038\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 16\.0\}\}68\.12\.468\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.4\}\}63\.63\.963\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.9\}\}17\.7SkillOS\-gemini96\.21\.6\\textbf\{96\.2\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 1\.6\}\}61\.513\.361\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 13\.3\}\}74\.13\.774\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 3\.7\}\}31\.212\.5\{31\.2\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 12\.5\}\}66\.74\.666\.7\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 4\.6\}\}68\.12\.468\.1\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.4\}\}71\.22\.971\.2\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.9\}\}16\.1SkillOS88\.60\.088\.6\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}84\.613\.3\\textbf\{84\.6\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 13\.3\}\}77\.80\.0\\textbf\{77\.8\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 0\.0\}\}37\.517\.237\.5\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 17\.2\}\}68\.08\.0\\textbf\{68\.0\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 8\.0\}\}68\.12\.4\\textbf\{68\.1\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.4\}\}73\.12\.7\\textbf\{73\.1\}\_\{\\,\{\\color\[rgb\]\{0\.5,0\.5,0\.5\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.5,0\.5,0\.5\}\\pgfsys@color@gray@stroke\{0\.5\}\\pgfsys@color@gray@fill\{0\.5\}\\scriptscriptstyle 2\.7\}\}15\.5
### C\.2Case Studies

![Refer to caption](https://arxiv.org/html/2605.06614v1/x17.png)Figure 17:Case studies of curated skills bySkillOS\.##### Curated Skills for Different Tasks\.

Figure[17](https://arxiv.org/html/2605.06614#A3.F17)presents two representative skills curated bySkillOSthat illustrate qualitatively different curation patterns across task types\. For agentic tasks \(Figure[17](https://arxiv.org/html/2605.06614#A3.F17)\(a\)\), the curator distills a meta\-strategy for failure recovery: rather than memorizing a specific object\-search trajectory, it abstracts the recovery procedure into a reusable workflow \(*exhaustive search*→\\rightarrow*confirm unavailability*→\\rightarrow*identify a substitute*→\\rightarrow*proceed with substitute*\) and explicitly references existing skills, demonstrating compositional curation\. For reasoning tasks \(Figure[17](https://arxiv.org/html/2605.06614#A3.F17)\(b\)\), the curator captures*branching\-out reasoning*: a single skill on inradius–circumradius–semiperimeter relations encodes multiple solution paths \(relating the target distance to either the in/circumradius or the side lengths\), each paired with its formula, application, and prerequisite constraints\. Together, these examples show thatSkillOSlearns to produce skills tailored to the structure of the underlying task: procedural and composable for agentic settings, and multi\-path with explicit preconditions for reasoning settings, rather than verbatim trajectory copies\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x18.png)Figure 18:Case study on math\-reasoning skill curation\.SkillOS\-base produces a generic partitioning recipe, whileSkillOScurates a concrete and reusable counting framework with explicit constraints, equations, and a worked example\.
##### HowSkillOSCurates Better Skills Compared to Baselines\.

We further qualitatively compare the skills curated bySkillOSagainst those produced by the baseline curator\. In the math\-reasoning case as shown in Figure[18](https://arxiv.org/html/2605.06614#A3.F18),SkillOS\-base outputs only a generic high\-level recipe based on partitioning into disjoint sets, without explicit formulas, constraints, or examples\. By comparison,SkillOScurates a much more useful skill that provides a concrete counting framework, including explicit constraint formulation, equation setup, and a worked example tailored to the target sub\-problem\. These examples show that RL\-trained skill curation improves not only the correctness of the curated content, but also its specificity and usability, enabling skills to better capture the underlying structure of tasks\.

![Refer to caption](https://arxiv.org/html/2605.06614v1/x19.png)Figure 19:Case studies of how skills curated bySkillOSsuccessfully helped to solve a task in ALFWorld\.
##### How Curated Skills Help to Solve Tasks Successfully\.

Figure[19](https://arxiv.org/html/2605.06614#A3.F19)illustrates a representative example of how curated skills improve agent behavior in interactive environments\. Given the task “look at the CD under the desklamp,” the memory\-free baseline fails to infer the correct object–location relation and performs an inefficient search over irrelevant containers, eventually exhausting the step budget\. In contrast,SkillOSretrieves a skill that encourages the agent to examine objects under or around light sources when the instruction refers to an object being “under” a lamp\. Guided by this reusable strategy, the agent first locates and picks up the CD near the desk area, then moves to the desklamp and inspects the correct target location, completing the task successfully\. This case highlights that curated skills do not merely memorize task\-specific action sequences; instead, they provide transferable decision guidance that helps the agent focus exploration on semantically relevant objects and locations, reducing unnecessary interactions and improving task success\.

## Appendix DLimitations

##### Retrieval Mechanism\.

Our current implementation relies on a relatively simple keyword\-based retrieval mechanism, such as BM25, to retrieve relevant skills from the skill repository\. This design choice allows us to isolate the main focus of this work: studying how skills can be curated, updated, and organized through experience\-driven learning\. However, more advanced retrieval methods, such as dense retrieval, hybrid retrieval, or learned retrievers, may further improve the relevance of retrieved skills and thus lead to stronger downstream performance\. We leave the joint optimization of skill curation and skill retrieval to future work\.

##### Simplified Skill Representation\.

Following Anthropic’s skill paradigm\[anthropic\_skills\_2025\], we instantiate each skill as a single Markdown file that combines a YAML frontmatter and Markdown body\. This simplification keeps the curator’s action space tractable, but it discards two affordances of the original SKILL\.md format: \(i\) supporting scripts and external resource files that allow skills to encapsulate executable procedures rather than purely declarative knowledge, and \(ii\) hierarchical organization in which a top\-level skill can reference or compose lower\-level sub\-skills\. As a result, behaviors that are most naturally expressed as runnable code or as compositions of finer\-grained primitives must currently be flattened into prose\. ExtendingSkillOSto multi\-file, hierarchical, and partially executable skills is a natural next step\.

##### Frozen Agent Executor\.

Throughout training, we keep the agent executorπℒ\\pi\_\{\\mathcal\{L\}\}frozen and optimize only the skill curatorπ𝒮\\pi\_\{\\mathcal\{S\}\}\. This decoupling is deliberate: it isolates the contribution of skill curation, makes the recipe modular across executors, and avoids confounding our analysis with executor\-side adaptation\. The downside is that the curator can only shape the system’s behavior through what it writes intoSkillRepo; any miscalibration between the curated skills and the executor’s idiosyncrasies must be absorbed by the curator alone\. Joint or alternating optimization ofπ𝒮\\pi\_\{\\mathcal\{S\}\}andπℒ\\pi\_\{\\mathcal\{L\}\}may yield a better\-aligned pair, at the cost of executor specificity and substantially higher training cost\.

## Appendix EFuture Research Directions

Our work opens several promising directions for future research\.

##### Agentic Search over Experiential Memory\.

SkillOScurrently retrieves relevant skills fromSkillRepothrough a fixed top\-kkBM25 lookup, treating retrieval as a static, one\-shot operation\. As the skill repository grows across thousands of tasks and domains, the bottleneck of self\-evolving agents shifts fromwhat to storetohow to reliably retrieve and inject the right fragmentsat each decision step\. A natural next step is to replace static retrieval withagentic search: letting the Skill Curator \(or a dedicated retrieval agent\) actively issue multiple queries, reformulate them based on intermediate evidence, and iteratively decide which skills to surface, cite, or compose for the executor\. This reframes memory access as a first\-class decision in the agent’s policy rather than a preprocessing step, and opens the door to scalingSkillOSto memory stores orders of magnitude larger than those considered here\.

##### Hierarchical and Compositional Skills\.

Our current skills are flat Markdown entries, each describing a single reusable pattern\. Real agent competence, however, is hierarchical: high\-level procedures invoke lower\-level sub\-skills, which in turn depend on primitive operations\. ExtendingSkillRepoto supporthierarchical decomposition— where the curator learns not only to insert, update, and delete skills but also to link, compose, and abstract them — could enable the agent to build increasingly expressive procedural libraries over time\. This direction connects naturally to program\-synthesis and library\-learning literature, and would allowSkillOSto scale to longer\-horizon tasks where single\-skill retrieval is insufficient\.

##### Multi\-Agent and Shared Memory\.

SkillOStreats memory as a single agent’s private artifact\. In many realistic deployments, however, multiple agents operate in parallel \(e\.g\., code review, multi\-hop research, collaborative robotics\) and could benefit fromshared experiential memory\. Open questions include how to arbitrate conflicting curation decisions from different agents, how to attribute credit when a shared skill contributes to one agent’s success but another’s failure, and how to preserve specialization while enabling cross\-agent transfer\. Our GRPO\-based curator provides a natural starting point, but extending it to the multi\-agent credit\-assignment setting is non\-trivial and likely to require new algorithmic ideas\.

## Appendix FUse of LLMs

We used LLMs as a general\-purpose writing assist tool during the preparation of this submission\. Specifically, LLMs were employed for polishing the clarity and readability of text \(e\.g\., refining sentence structure, improving grammar, and shortening overly verbose phrasing\)\. All research ideas, methodology design, experiments, analyses, and final writing decisions were conceived, implemented, and validated solely by the authors\.

Similar Articles

SkillOS: Learning Skill Curation for Self-Evolving Agents

Hugging Face Daily Papers

This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.

@omarsar0: New research from Microsoft Research I see a lot of AI engineers handwriting agent skill docs and hope they generalize.…

X AI KOLs Following

Microsoft Research introduces SkillOpt, a method that treats agent skill documents as trainable external state, using an optimizer model to make bounded edits validated by a held-out set. The approach achieves best or tied results across 52 evaluation cells and improves accuracy by over 23 points on GPT-5.5, with zero extra inference cost and transferable skills.

OpenSkill: Open-World Self-Evolution for LLM Agents

Hugging Face Daily Papers

OpenSkill is a framework for LLM agents to self-evolve skills and verification signals from open-world resources without target-task supervision, achieving high performance across benchmarks.