@hanakoxbt: An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents. Anthropic's own skill-creator …
Summary
MIT team released a paper on self-evolving skills for Claude Code agents, achieving 71.1% pass rate, surpassing Anthropic's skill-creator by 37 points through a Generate-Test-Verify-Co-Evolve framework.
View Cached Full Text
Cached at: 06/26/26, 08:07 AM
An MIT team just dropped a 24-page PDF on “Self-Evolving Skills” for Claude Code agents.
Anthropic’s own skill-creator hits 34% pass rate. This framework hits 71%.
Generate → Test → Verify → Co-Evolve
Generate: after every task failure, the agent writes a candidate skill for what just broke.
Test: the new skill runs on a held-out set with the same frozen Claude model.
Verify: if it scores higher than the current best, it gets promoted. If not, it’s rejected and the failure is logged.
Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving.
The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic’s own skill-creator by 37 points across SkillsBench and Codex.
This is exactly why engineers stopped writing skills by hand and let the agent evolve them.
Read the paper, then grab the setup below.
Similar Articles
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
SkillFlow introduces a benchmark of 166 tasks across 20 families for evaluating autonomous agents' ability to discover, repair, and maintain skills over time through a lifelong learning protocol. Experiments reveal a substantial capability gap among leading models, with Claude Opus 4.6 improving significantly while others show limited or negative gains from skill evolution.
@mylifcc: Anthropic published a major blog post on June 3rd, "Lessons from building Claude Code: How we use skills", summarizing Anthropic's understanding of skills: What exactly are skills? (Core concept clarification) Not: ...
Anthropic published a blog explaining the concept of Skills in Claude Code: Skills are a folder containing instructions, scripts, reference materials, etc., enabling the Agent to progressively disclose context, reducing hallucinations and token waste.
@sheriyuo: Every "self-evolving agent" paper this year has mutated text: prompts, skill files, workflow graphs, memory schemas. MO…
MOSS introduces source-level rewriting for self-evolving agents, enabling fixes to structural failures that text-layer evolution cannot reach. It lifts a four-task mean grader score from 0.25 to 0.61 in a single cycle on OpenClaw without human intervention.
mattpocock/skills
This open-source repository provides a composable set of AI agent skills and prompts designed to improve alignment, reduce verbosity, and optimize workflows for coding assistants like Claude Code and Codex.
@Huanusa: Anthropic has officially released a highly valuable resource: "The Complete Guide to Building Skills for Claude" — a full 33-page PDF. This isn't a beginner's overview, but a comprehensive guide to building skills for Claude…
Anthropic has released a 33-page PDF guide, "The Complete Guide to Building Skills for Claude," which details how to design, organize, optimize, and reuse Claude's Skills. It is suitable for Claude Code users and AI Agent developers.