The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators
Summary
This paper introduces the Red Queen Gödel Machine (RQGM), an evolutionary framework for recursive self-improvement under non-stationary utilities, where agents and evaluators co-evolve, improving performance on coding tasks, scientific writing, and Olympiad-level proof grading.
View Cached Full Text
Cached at: 06/26/26, 05:17 AM
# The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators Source: [https://arxiv.org/abs/2606.26294](https://arxiv.org/abs/2606.26294) Authors:[Alex Iacob](https://arxiv.org/search/cs?searchtype=author&query=Iacob,+A),[Andrej Jovanović](https://arxiv.org/search/cs?searchtype=author&query=Jovanovi%C4%87,+A),[William F\. Shen](https://arxiv.org/search/cs?searchtype=author&query=Shen,+W+F),[Daniel Burkhardt](https://arxiv.org/search/cs?searchtype=author&query=Burkhardt,+D),[Meghdad Kurmanji](https://arxiv.org/search/cs?searchtype=author&query=Kurmanji,+M),[Nurbek Tastan](https://arxiv.org/search/cs?searchtype=author&query=Tastan,+N),[Lorenzo Sani](https://arxiv.org/search/cs?searchtype=author&query=Sani,+L),[Niccolò Alberto Elia Venanzi](https://arxiv.org/search/cs?searchtype=author&query=Venanzi,+N+A+E),[Ambroise Odonnat](https://arxiv.org/search/cs?searchtype=author&query=Odonnat,+A),[Zeyu Cao](https://arxiv.org/search/cs?searchtype=author&query=Cao,+Z),[Bill Marino](https://arxiv.org/search/cs?searchtype=author&query=Marino,+B),[Xinchi Qiu](https://arxiv.org/search/cs?searchtype=author&query=Qiu,+X),[Nicholas D\. Lane](https://arxiv.org/search/cs?searchtype=author&query=Lane,+N+D) [View PDF](https://arxiv.org/pdf/2606.26294) > Abstract:Self\-improving agents are state\-of\-the\-art \(SOTA\) on agentic coding benchmarks and have recently been extended to general domains\. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier, benchmark, or labeled dataset that remains valid as the agent improves\. This ignores a central feature of evolution: species adapt as their environments change with them\. We aim to bring the same principle to recursive self\-improvement, making evaluation part of the improvement loop and opening search to evolving evaluators, adversarial objectives, and dynamic utilities that may surpass static benchmarks\. We introduce the Red Queen Godel Machine \(RQGM\), an evolutionary framework for recursive self\-improvement under non\-stationary utilities\. The RQGM makes this possible through controlled utility evolution: search is organized into epochs with a fixed within\-epoch evaluation criterion, while the utility can be updated at epoch boundaries, so self\-improvement guarantees hold per epoch as the objective evolves across them\. We begin by showing that even on verifiable coding tasks, the RQGM improves test pass rate over the prior SOTA by adding a complementary agent\-as\-a\-judge code\-review signal\. This signal is cheaper and the RQGM uses 1\.35x\-1\.72x fewer tokens\. We then turn to scientific paper writing and reviewing, and Olympiad\-level proof writing and grading, where the RQGM improves performance over prior self\-improving agents: co\-evolved writers reach 1\.78x\-1\.86x higher acceptance rates under a diverse agent\-as\-a\-judge panel, while co\-evolved graders reach 9% higher ground\-truth accuracy\. In paper reviewing, the strongest baseline reviewer over\-accepts AI\-generated papers at up to 1\.91x the human rate\. The RQGM corrects this by introducing an adversarial objective that discovers reviewers equally stringent on AI and human work\. ## Submission history From: Alex Iacob \[[view email](https://arxiv.org/show-email/d5e4744a/2606.26294)\] **\[v1\]**Wed, 24 Jun 2026 18:38:26 UTC \(1,058 KB\)
Similar Articles
@rohanpaul_ai: New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither…
A new paper from Cambridge, NVIDIA, and other labs introduces the Red Queen Gödel Machine, a method where AI agents and their evaluators co-evolve to prevent stagnation. The approach avoids fixed benchmarks by allowing judges to improve at safe handoff points, leading to better performance in coding and paper writing tasks.
@Phoenixyin13: Incredible! This Red Queen Gödel Machine from NVIDIA, Cambridge University, and other teams is absolutely one of the most important AI papers I've seen recently. This time, the paper directly targets the core bottleneck of self-improving AI: previously, once the evaluator was fixed, it led to agents gaming the system or quickly stagnating...
The Red Queen Gödel Machine paper from NVIDIA, Cambridge University, and other teams solves the bottleneck of recursive self-improvement by co-evolving agents and evaluators. It surpasses existing SOTA on tasks like code and paper writing, providing an important methodology for controlled open-ended AI evolution.
Recursive Self-Evolving Agents via Held-Out Selection
Introduces RSEA, a method for recursive self-evolution of LLM agents using a three-layer natural-language state and a held-out selection gate to prevent regression. Evaluated across four benchmarks, it shows that context evolution is benchmark-dependent and that a strict selection gate is crucial for reliability.
Self-Evolving Deep Research via Joint Generation and Evaluation
Researchers from HKUST, ByteDance, and UCL propose SCORE, a co-evolutionary training framework that jointly trains an LLM as both a deep research report generator and an evaluator, using a meta-harness to dynamically adjust evaluation difficulty and prevent reward saturation. Experiments show consistent improvement in open-ended research report quality.
@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2054201045346287766
The article discusses new research from Sakana AI and Meta on self-improving AI agents, specifically the Darwin-Gödel Machine and Hyperagents, which autonomously rewrite their own code and infrastructure to enhance performance without human intervention.