@hanakoxbt: An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents. Anthropic's own skill-creator …

X AI KOLs Timeline Papers

Summary

MIT team released a paper on self-evolving skills for Claude Code agents, achieving 71.1% pass rate, surpassing Anthropic's skill-creator by 37 points through a Generate-Test-Verify-Co-Evolve framework.

An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents. Anthropic's own skill-creator hits 34% pass rate. This framework hits 71%. Generate → Test → Verify → Co-Evolve > Generate: after every task failure, the agent writes a candidate skill for what just broke. > Test: the new skill runs on a held-out set with the same frozen Claude model. > Verify: if it scores higher than the current best, it gets promoted. If not, it's rejected and the failure is logged. > Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving. The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic's own skill-creator by 37 points across SkillsBench and Codex. This is exactly why engineers stopped writing skills by hand and let the agent evolve them. Read the paper, then grab the setup below.
Original Article
View Cached Full Text

Cached at: 06/26/26, 08:07 AM

An MIT team just dropped a 24-page PDF on “Self-Evolving Skills” for Claude Code agents.

Anthropic’s own skill-creator hits 34% pass rate. This framework hits 71%.

Generate → Test → Verify → Co-Evolve

Generate: after every task failure, the agent writes a candidate skill for what just broke.

Test: the new skill runs on a held-out set with the same frozen Claude model.

Verify: if it scores higher than the current best, it gets promoted. If not, it’s rejected and the failure is logged.

Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving.

The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic’s own skill-creator by 37 points across SkillsBench and Codex.

This is exactly why engineers stopped writing skills by hand and let the agent evolve them.

Read the paper, then grab the setup below.

Similar Articles

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

Hugging Face Daily Papers

SkillFlow introduces a benchmark of 166 tasks across 20 families for evaluating autonomous agents' ability to discover, repair, and maintain skills over time through a lifelong learning protocol. Experiments reveal a substantial capability gap among leading models, with Claude Opus 4.6 improving significantly while others show limited or negative gains from skill evolution.

@mylifcc: Anthropic published a major blog post on June 3rd, "Lessons from building Claude Code: How we use skills", summarizing Anthropic's understanding of skills: What exactly are skills? (Core concept clarification) Not: ...

X AI KOLs Timeline

Anthropic published a blog explaining the concept of Skills in Claude Code: Skills are a folder containing instructions, scripts, reference materials, etc., enabling the Agent to progressively disclose context, reducing hallucinations and token waste.

mattpocock/skills

GitHub Trending (daily)

This open-source repository provides a composable set of AI agent skills and prompts designed to improve alignment, reduce verbosity, and optimize workflows for coding assistants like Claude Code and Codex.