@rohanpaul_ai: New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither…

X AI KOLs Following Papers

Summary

A new paper from Cambridge, NVIDIA, and other labs introduces the Red Queen Gödel Machine, a method where AI agents and their evaluators co-evolve to prevent stagnation. The approach avoids fixed benchmarks by allowing judges to improve at safe handoff points, leading to better performance in coding and paper writing tasks.

New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck. Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better. The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game. The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge. During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers. The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment. On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback. On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline. The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure. ---- Link – arxiv. org/abs/2606.26294 Title: "The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators"
Original Article
View Cached Full Text

Cached at: 06/29/26, 10:34 AM

New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither side gets stuck.

Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better.

The problem is that most self-improving agents train against a fixed benchmark or fixed evaluator, so the score can become stale, too easy, or easy to game.

The paper’s idea is to let the evaluator improve too, but only at safe handoff points, so each training stretch still has a stable judge.

During each stretch, agents are tested by the current frozen evaluator, while possible better evaluators are tested separately against held-out human or objective answers.

The authors try this on coding, paper writing, paper reviewing, proof writing, and proof grading, where some tasks have clear answers and others need learned judgment.

On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback.

On paper writing, the co-evolved writer gets about 1.86X higher average acceptance from a reviewer panel than the fixed-evaluator baseline.

The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure.


Link – arxiv. org/abs/2606.26294

Title: “The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators”

Similar Articles

@Phoenixyin13: Incredible! This Red Queen Gödel Machine from NVIDIA, Cambridge University, and other teams is absolutely one of the most important AI papers I've seen recently. This time, the paper directly targets the core bottleneck of self-improving AI: previously, once the evaluator was fixed, it led to agents gaming the system or quickly stagnating...

X AI KOLs Timeline

The Red Queen Gödel Machine paper from NVIDIA, Cambridge University, and other teams solves the bottleneck of recursive self-improvement by co-evolving agents and evaluators. It surpasses existing SOTA on tasks like code and paper writing, providing an important methodology for controlled open-ended AI evolution.

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators

arXiv cs.LG

This paper introduces the Red Queen Gödel Machine (RQGM), an evolutionary framework for recursive self-improvement under non-stationary utilities, where agents and evaluators co-evolve, improving performance on coding tasks, scientific writing, and Olympiad-level proof grading.