@rohanpaul_ai: Better self-improving agents need better solvers, not bigger update-writing models. This challenges the common habit of…
Summary
This paper disentangles the roles of evolver and agent in self-improving LLM agents, showing that a small evolver can write sufficiently good updates, while a mid-tier agent benefits most from using them. It recommends using the strongest model as the task executor, not the update writer.
View Cached Full Text
Cached at: 06/05/26, 01:16 PM
Better self-improving agents need better solvers, not bigger update-writing models.
This challenges the common habit of putting the strongest model in the evolver seat.
The usual intuition was: put the strongest model in the evolver seat, because a better model should write better prompts, memories, tools, and skills.
This paper cuts that intuition in half.
It separates two jobs that are usually blurred together: writing useful harness updates, and benefiting from those updates during task execution.
The paper says the cheaper model can often write good enough prompt, memory, or skill updates. So a small Qwen3.5-9B evolver can create updates that help about as much as Claude Opus 4.6.
The expensive model is more useful as the agent that actually solves the task with those updates.
i.e. using the updates is very model-dependent, because weak models often fail to load the right skill or load it and then stop following it during a long task.
Strong models can use the harness, but they may already be close enough to their ceiling that the update has less room to help.
The sweet spot is the mid-tier model: capable enough to invoke and follow the new procedure, but not so capable that the harness has nothing left to teach.
Link – arxiv. org/abs/2605.30621
Title: “Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents”
Similar Articles
@omarsar0: Very good advice on self-improving agents. (bookmark it) This is something I am seeing in my own experiments with codin…
Tweet discussing advice on self-improving agents, with personal observations from experiments on coding agents for long-horizon tasks, noting that stronger models don't always yield better agents.
A framework for when AI agents should (and shouldn't) self-evolve
The article argues that self-evolution in AI agents should be applied cautiously and proposes an Evolution Governor that audits workflows to decide when to evolve, based on conditions like repeatable tasks and external feedback.
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution
Solvita is an agentic evolution framework that enables continuous learning in code generation through reinforcement learning updates to graph-structured knowledge networks, achieving state-of-the-art performance on competitive programming benchmarks.
@dair_ai: Great paper on self-improving agents:
A prominent AI paper from the week addresses whether self-improving agents are truly discovering new knowledge or merely remixing existing information.
@sheriyuo: Every "self-evolving agent" paper this year has mutated text: prompts, skill files, workflow graphs, memory schemas. MO…
MOSS introduces source-level rewriting for self-evolving agents, enabling fixes to structural failures that text-layer evolution cannot reach. It lifts a four-task mean grader score from 0.25 to 0.61 in a single cycle on OpenClaw without human intervention.