The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators
Summary
This paper introduces the Red Queen Gödel Machine (RQGM), an evolutionary framework for recursive self-improvement under non-stationary utilities, where agents and evaluators co-evolve, improving performance on coding tasks, scientific writing, and Olympiad-level proof grading.
View Cached Full Text
Cached at: 06/26/26, 05:17 AM
# The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators Source: [https://arxiv.org/abs/2606.26294](https://arxiv.org/abs/2606.26294) Authors:[Alex Iacob](https://arxiv.org/search/cs?searchtype=author&query=Iacob,+A),[Andrej Jovanović](https://arxiv.org/search/cs?searchtype=author&query=Jovanovi%C4%87,+A),[William F\. Shen](https://arxiv.org/search/cs?searchtype=author&query=Shen,+W+F),[Daniel Burkhardt](https://arxiv.org/search/cs?searchtype=author&query=Burkhardt,+D),[Meghdad Kurmanji](https://arxiv.org/search/cs?searchtype=author&query=Kurmanji,+M),[Nurbek Tastan](https://arxiv.org/search/cs?searchtype=author&query=Tastan,+N),[Lorenzo Sani](https://arxiv.org/search/cs?searchtype=author&query=Sani,+L),[Niccolò Alberto Elia Venanzi](https://arxiv.org/search/cs?searchtype=author&query=Venanzi,+N+A+E),[Ambroise Odonnat](https://arxiv.org/search/cs?searchtype=author&query=Odonnat,+A),[Zeyu Cao](https://arxiv.org/search/cs?searchtype=author&query=Cao,+Z),[Bill Marino](https://arxiv.org/search/cs?searchtype=author&query=Marino,+B),[Xinchi Qiu](https://arxiv.org/search/cs?searchtype=author&query=Qiu,+X),[Nicholas D\. Lane](https://arxiv.org/search/cs?searchtype=author&query=Lane,+N+D) [View PDF](https://arxiv.org/pdf/2606.26294) > Abstract:Self\-improving agents are state\-of\-the\-art \(SOTA\) on agentic coding benchmarks and have recently been extended to general domains\. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier, benchmark, or labeled dataset that remains valid as the agent improves\. This ignores a central feature of evolution: species adapt as their environments change with them\. We aim to bring the same principle to recursive self\-improvement, making evaluation part of the improvement loop and opening search to evolving evaluators, adversarial objectives, and dynamic utilities that may surpass static benchmarks\. We introduce the Red Queen Godel Machine \(RQGM\), an evolutionary framework for recursive self\-improvement under non\-stationary utilities\. The RQGM makes this possible through controlled utility evolution: search is organized into epochs with a fixed within\-epoch evaluation criterion, while the utility can be updated at epoch boundaries, so self\-improvement guarantees hold per epoch as the objective evolves across them\. We begin by showing that even on verifiable coding tasks, the RQGM improves test pass rate over the prior SOTA by adding a complementary agent\-as\-a\-judge code\-review signal\. This signal is cheaper and the RQGM uses 1\.35x\-1\.72x fewer tokens\. We then turn to scientific paper writing and reviewing, and Olympiad\-level proof writing and grading, where the RQGM improves performance over prior self\-improving agents: co\-evolved writers reach 1\.78x\-1\.86x higher acceptance rates under a diverse agent\-as\-a\-judge panel, while co\-evolved graders reach 9% higher ground\-truth accuracy\. In paper reviewing, the strongest baseline reviewer over\-accepts AI\-generated papers at up to 1\.91x the human rate\. The RQGM corrects this by introducing an adversarial objective that discovers reviewers equally stringent on AI and human work\. ## Submission history From: Alex Iacob \[[view email](https://arxiv.org/show-email/d5e4744a/2606.26294)\] **\[v1\]**Wed, 24 Jun 2026 18:38:26 UTC \(1,058 KB\)
Similar Articles
@Phoenixyin13: Incredible! This Red Queen Gödel Machine from NVIDIA, Cambridge University, and other teams is absolutely one of the most important AI papers I've seen recently. This time, the paper directly targets the core bottleneck of self-improving AI: previously, once the evaluator was fixed, it led to agents gaming the system or quickly stagnating...
The Red Queen Gödel Machine paper from NVIDIA, Cambridge University, and other teams solves the bottleneck of recursive self-improvement by co-evolving agents and evaluators. It surpasses existing SOTA on tasks like code and paper writing, providing an important methodology for controlled open-ended AI evolution.
Self-Evolving Deep Research via Joint Generation and Evaluation
Researchers from HKUST, ByteDance, and UCL propose SCORE, a co-evolutionary training framework that jointly trains an LLM as both a deep research report generator and an evaluator, using a meta-harness to dynamically adjust evaluation difficulty and prevent reward saturation. Experiments show consistent improvement in open-ended research report quality.
@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2054201045346287766
The article discusses new research from Sakana AI and Meta on self-improving AI agents, specifically the Darwin-Gödel Machine and Hyperagents, which autonomously rewrite their own code and infrastructure to enhance performance without human intervention.
A framework for when AI agents should (and shouldn't) self-evolve
The article argues that self-evolution in AI agents should be applied cautiously and proposes an Evolution Governor that audits workflows to decide when to evolve, based on conditions like repeatable tasks and external feedback.
SEAGym: An Evaluation Environment for Self-Evolving LLM Agents
SEAGym is a new evaluation environment for self-evolving LLM agents that measures agent harness updates across training, validation, test, replay, and cost records, providing complementary signals about the evolution process.