@hanakoxbt: An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents. Anthropic's own skill-creator …

X AI KOLs Timeline 06/25/26, 01:01 PM Papers

self-evolving-skills claude-code agents mit skill-creator co-evolution ai-agents

Summary

MIT team released a paper on self-evolving skills for Claude Code agents, achieving 71.1% pass rate, surpassing Anthropic's skill-creator by 37 points through a Generate-Test-Verify-Co-Evolve framework.

An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents. Anthropic's own skill-creator hits 34% pass rate. This framework hits 71%. Generate → Test → Verify → Co-Evolve > Generate: after every task failure, the agent writes a candidate skill for what just broke. > Test: the new skill runs on a held-out set with the same frozen Claude model. > Verify: if it scores higher than the current best, it gets promoted. If not, it's rejected and the failure is logged. > Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving. The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic's own skill-creator by 37 points across SkillsBench and Codex. This is exactly why engineers stopped writing skills by hand and let the agent evolve them. Read the paper, then grab the setup below.

Original Article

View Cached Full Text

Cached at: 06/26/26, 08:07 AM

An MIT team just dropped a 24-page PDF on “Self-Evolving Skills” for Claude Code agents.

Anthropic’s own skill-creator hits 34% pass rate. This framework hits 71%.

Generate → Test → Verify → Co-Evolve

Generate: after every task failure, the agent writes a candidate skill for what just broke.

Test: the new skill runs on a held-out set with the same frozen Claude model.

Verify: if it scores higher than the current best, it gets promoted. If not, it’s rejected and the failure is logged.

Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving.

The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic’s own skill-creator by 37 points across SkillsBench and Codex.

This is exactly why engineers stopped writing skills by hand and let the agent evolve them.

Read the paper, then grab the setup below.

@hanakoxbt: An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents. Anthropic's own skill-creator …

Similar Articles

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

@mylifcc: Anthropic published a major blog post on June 3rd, "Lessons from building Claude Code: How we use skills", summarizing Anthropic's understanding of skills: What exactly are skills? (Core concept clarification) Not: ...

@sheriyuo: Every "self-evolving agent" paper this year has mutated text: prompts, skill files, workflow graphs, memory schemas. MO…

mattpocock/skills

@Huanusa: Anthropic has officially released a highly valuable resource: "The Complete Guide to Building Skills for Claude" — a full 33-page PDF. This isn't a beginner's overview, but a comprehensive guide to building skills for Claude…

Submit Feedback

Similar Articles

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

@mylifcc: Anthropic published a major blog post on June 3rd, "Lessons from building Claude Code: How we use skills", summarizing Anthropic's understanding of skills: What exactly are skills? (Core concept clarification) Not: ...

@sheriyuo: Every "self-evolving agent" paper this year has mutated text: prompts, skill files, workflow graphs, memory schemas. MO…

@Huanusa: Anthropic has officially released a highly valuable resource: "The Complete Guide to Building Skills for Claude" — a full 33-page PDF. This isn't a beginner's overview, but a comprehensive guide to building skills for Claude…