EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale
Summary
EvoMaster is a scalable, self-evolving agent framework for large-scale scientific discovery that enables iterative hypothesis refinement and knowledge accumulation across experimental cycles. It achieves state-of-the-art results on four benchmarks including Humanity's Last Exam (41.1%) and MLE-Bench Lite (75.8%), outperforming general-purpose baselines by up to 316%.
View Cached Full Text
Cached at: 04/21/26, 07:20 AM
Paper page - EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale
Source: https://huggingface.co/papers/2604.17406 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
EvoMaster is a scalable, self-evolving agent framework designed for large-scale scientific discovery that enables iterative hypothesis refinement and knowledge accumulation across experimental cycles.
The convergence of large language models and agents is catalyzing a new era of scientific discovery:Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundationalevolving agent frameworkengineered specifically forAgentic Scienceat Scale. Driven by the core principle of continuousself-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring humanscientific inquiry. Crucially, as adomain-agnostic baseharness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity’s Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation ofautonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.
View arXiv pageView PDFGitHub116Add to collection
Get this paper in your agent:
hf papers read 2604\.17406
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.17406 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.17406 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.17406 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
EvoScientist is an adaptive multi-agent framework for end-to-end scientific discovery that continuously improves through persistent memory modules, comprising three specialized agents for idea generation, experiment execution, and knowledge distillation. It outperforms 7 state-of-the-art systems in scientific idea generation and improves code execution success rates through multi-agent evolution.
@tom_doerr: Automates research workflows with persistent multi-agent memory https://github.com/EvoScientist/EvoScientist…
EvoScientist is an open-source framework that automates research workflows using self-evolving AI scientists with persistent multi-agent memory, adopting a human-on-the-loop paradigm for autonomous research exploration and insight generation.
EvoMap/evolver
Evolver is a GEP-powered self-evolution engine for AI agents that automates prompt optimization and creates auditable, reusable evolution assets. The project is transitioning from fully open source to source-available while maintaining backward compatibility with existing MIT and GPL-3.0 releases.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Agent-World introduces a self-evolving training framework for general agent intelligence that autonomously discovers real-world environments and tasks via the Model Context Protocol, enabling continuous learning. Agent-World-8B and 14B models outperform strong proprietary models across 23 challenging agent benchmarks.
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
EvoTest introduces J-TTL, a benchmark for measuring agent test-time learning capabilities, and proposes an evolutionary framework where an Actor Agent plays games while an Evolver Agent iteratively improves the system's prompts, memory, and hyperparameters without fine-tuning. The method demonstrates superior performance compared to reflection and memory-based baselines on complex text-based games.