Tag
A practical guide explaining three levels of building self-improving AI agents, from manual loops to automated design, with recommended tools and frameworks.
A one-person company runs entirely with 7 AI agents, 10 cron jobs, and no human employees. The agents self-evaluate and improve, operating through Telegram.
This paper presents a novel blueprint for self-improving agents that combines scaffold editing and weight training through a meta-agent and feedback-agent, achieving a 14x speedup on a CUDA kernel for AlphaFold.
This paper introduces the Red Queen Gödel Machine (RQGM), an evolutionary framework for recursive self-improvement under non-stationary utilities, where agents and evaluators co-evolve, improving performance on coding tasks, scientific writing, and Olympiad-level proof grading.
This paper introduces Regimes, an auditable, held-out-gated improvement loop built on the ActiveGraph runtime for self-improving agents. It demonstrates modest improvements on the LongMemEval dataset by autonomously discovering prompt repairs that pass static checks, sandbox execution, and held-out validation.
EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization, achieving significant improvements over existing methods across multiple benchmarks.
A prominent AI paper from the week addresses whether self-improving agents are truly discovering new knowledge or merely remixing existing information.
This paper introduces a categorical framework for distinguishing genuine scientific discovery from mere retrieval or search in self-improving AI agents, using category theory to formalize regime transitions. The authors demonstrate the framework with a protein mechanics example where an agent's accuracy drops as it tackles harder problems, but its theory compresses more data, indicating real discovery.
This paper disentangles the roles of evolver and agent in self-improving LLM agents, showing that a small evolver can write sufficiently good updates, while a mid-tier agent benefits most from using them. It recommends using the strongest model as the task executor, not the update writer.
Tweet discussing advice on self-improving agents, with personal observations from experiments on coding agents for long-horizon tasks, noting that stronger models don't always yield better agents.
HALO uses RLMs to optimize AI agent harnesses by analyzing execution traces and suggesting improvements, achieving 10%+ gains on several benchmarks like Terminal-Bench and AppWorld.
The article discusses new research from Sakana AI and Meta on self-improving AI agents, specifically the Darwin-Gödel Machine and Hyperagents, which autonomously rewrite their own code and infrastructure to enhance performance without human intervention.
Hermes Agent demonstrates self-improvement capabilities by observing its own performance, identifying inefficiencies, and rewriting its skills to achieve a 3x speedup and 80% cost reduction in just two iterations.
A paper introduces a protocol framework for self-improving AI agents, enabling auditable improvement proposals, assessments, and rollbacks.