@dunik_7: An AI more than doubled its own coding ability while the researchers just watched. 20% -> 50% on SWE-bench. They never …

X AI KOLs Timeline Papers

Summary

A paper from Jeff Clune's lab describes an AI that doubled its coding ability on SWE-bench from 20% to 50% by rewriting its own source code without human intervention, using an evolutionary approach.

An AI more than doubled its own coding ability while the researchers just watched. 20% -> 50% on SWE-bench. They never touched it. It pulled that off by rewriting its own source code. That's the whole paper the Darwin Gödel Machine, 72 pages out of Jeff Clune's lab. An agent finally pointed at itself. / it reads its own code and edits one piece a tool, a retry rule, a prompt / it runs the new version on real coding tasks and checks if the score actually moved / the variants that win get archived and branched, like evolution, so it never dead-ends / then it runs the whole thing again, on the agent that just improved Everyone's still arguing about which model is smartest. This quietly skips the question. Same model, same weights the agent around it gets sharper every pass, by its own hand. Loop 4 in my breakdown was the one people called sci-fi. Here it is, benchmarked, with a public repo. The unsettling part isn't that it worked. It's that nobody needed to be in the room. Paper's below. Read it before it improves itself again.
Original Article
View Cached Full Text

Cached at: 06/24/26, 08:30 PM

An AI more than doubled its own coding ability while the researchers just watched. 20% -> 50% on SWE-bench. They never touched it.

It pulled that off by rewriting its own source code.

That’s the whole paper the Darwin Gödel Machine, 72 pages out of Jeff Clune’s lab. An agent finally pointed at itself.

/ it reads its own code and edits one piece a tool, a retry rule, a prompt / it runs the new version on real coding tasks and checks if the score actually moved / the variants that win get archived and branched, like evolution, so it never dead-ends / then it runs the whole thing again, on the agent that just improved

Everyone’s still arguing about which model is smartest. This quietly skips the question. Same model, same weights the agent around it gets sharper every pass, by its own hand.

Loop 4 in my breakdown was the one people called sci-fi. Here it is, benchmarked, with a public repo.

The unsettling part isn’t that it worked. It’s that nobody needed to be in the room.

Paper’s below. Read it before it improves itself again.

Similar Articles

@Khazix0918: https://x.com/Khazix0918/status/2062731170337763796

X AI KOLs Timeline

Anthropic publishes in-depth article 'When AI builds itself', showing AI systems accelerating their own development, including code generation, benchmark saturation, and internal data indicating an 8x increase in engineer productivity. The article explores the trend and potential impact of recursive self-improvement.

When AI Builds Itself: Our progress toward recursive self-improvement

Hacker News Top

Anthropic's Institute publishes analysis on progress toward recursive self-improvement, showing AI is already accelerating AI development—engineers ship 8x more code per quarter—and projecting that AI systems capable of fully autonomous self-improvement could arrive sooner than most institutions are prepared for.