@rohanpaul_ai: New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither…
Summary
A new paper from Cambridge, NVIDIA, and other labs introduces the Red Queen Gödel Machine, a method where AI agents and their evaluators co-evolve to prevent stagnation. The approach avoids fixed benchmarks by allowing judges to improve at safe handoff points, leading to better performance in coding and paper writing tasks.
View Cached Full Text
Cached at: 06/29/26, 10:34 AM
New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck.
Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better.
The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game.
The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge.
During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers.
The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment.
On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback.
On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline.
The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure.
Link – arxiv. org/abs/2606.26294
Title: “The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators”
Similar Articles
@Phoenixyin13: Incredible! This Red Queen Gödel Machine from NVIDIA, Cambridge University, and other teams is absolutely one of the most important AI papers I've seen recently. This time, the paper directly targets the core bottleneck of self-improving AI: previously, once the evaluator was fixed, it led to agents gaming the system or quickly stagnating...
The Red Queen Gödel Machine paper from NVIDIA, Cambridge University, and other teams solves the bottleneck of recursive self-improvement by co-evolving agents and evaluators. It surpasses existing SOTA on tasks like code and paper writing, providing an important methodology for controlled open-ended AI evolution.
The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators
This paper introduces the Red Queen Gödel Machine (RQGM), an evolutionary framework for recursive self-improvement under non-stationary utilities, where agents and evaluators co-evolve, improving performance on coding tasks, scientific writing, and Olympiad-level proof grading.
@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2054201045346287766
The article discusses new research from Sakana AI and Meta on self-improving AI agents, specifically the Darwin-Gödel Machine and Hyperagents, which autonomously rewrite their own code and infrastructure to enhance performance without human intervention.
@rohanpaul_ai: Brilliant new paper from Meta, CMU and other labs. Shows that coding agents improve faster by manufacturing their own s…
A new paper from Meta, CMU, and other labs presents Self-play SWE-RL, a method where coding agents train themselves by manufacturing and fixing bugs in real codebases, achieving significant gains on SWE-bench benchmarks without relying on human-written tasks.
@Mnilax: Google and Stanford engineers just dropped a 39-page PDF on what actually makes an AI agent self-improve. input → outpu…
A 39-page paper from Google and Stanford engineers analyzes the key factors that enable AI agents to self-improve through feedback loops, noting that only 9% of agents actually run a real loop.