@rohanpaul_ai: New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither…

X AI KOLs Following 06/29/26, 05:04 AM Papers

co-evolving self-improving ai-agents evaluators code-generation paper-writing reinforcement-learning cambridge nvidia

Summary

A new paper from Cambridge, NVIDIA, and other labs introduces the Red Queen Gödel Machine, a method where AI agents and their evaluators co-evolve to prevent stagnation. The approach avoids fixed benchmarks by allowing judges to improve at safe handoff points, leading to better performance in coding and paper writing tasks.

New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck. Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better. The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game. The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge. During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers. The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment. On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback. On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline. The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure. ---- Link – arxiv. org/abs/2606.26294 Title: "The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators"

Original Article

View Cached Full Text

Cached at: 06/29/26, 10:34 AM

New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck.

Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better.

The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game.

The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge.

During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers.

The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment.

On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback.

On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline.

The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure.

Link – arxiv. org/abs/2606.26294

Title: “The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators”

@rohanpaul_ai: New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither…

Similar Articles

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2054201045346287766

@rohanpaul_ai: Brilliant new paper from Meta, CMU and other labs. Shows that coding agents improve faster by manufacturing their own s…

@Mnilax: Google and Stanford engineers just dropped a 39-page PDF on what actually makes an AI agent self-improve. input → outpu…

Submit Feedback

Similar Articles

@Phoenixyin13: Incredible! This Red Queen Gödel Machine from NVIDIA, Cambridge University, and other teams is absolutely one of the most important AI papers I've seen recently. This time, the paper directly targets the core bottleneck of self-improving AI: previously, once the evaluator was fixed, it led to agents gaming the system or quickly stagnating...

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2054201045346287766

@rohanpaul_ai: Brilliant new paper from Meta, CMU and other labs. Shows that coding agents improve faster by manufacturing their own s…

@Mnilax: Google and Stanford engineers just dropped a 39-page PDF on what actually makes an AI agent self-improve. input → outpu…