EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale

Hugging Face Daily Papers Papers

Summary

EvoMaster is a scalable, self-evolving agent framework for large-scale scientific discovery that enables iterative hypothesis refinement and knowledge accumulation across experimental cycles. It achieves state-of-the-art results on four benchmarks including Humanity's Last Exam (41.1%) and MLE-Bench Lite (75.8%), outperforming general-purpose baselines by up to 316%.

The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundational evolving agent framework engineered specifically for Agentic Science at Scale. Driven by the core principle of continuous self-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring human scientific inquiry. Crucially, as a domain-agnostic base harness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity's Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation of autonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/21/26, 07:20 AM

Paper page - EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale

Source: https://huggingface.co/papers/2604.17406 Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

EvoMaster is a scalable, self-evolving agent framework designed for large-scale scientific discovery that enables iterative hypothesis refinement and knowledge accumulation across experimental cycles.

The convergence of large language models and agents is catalyzing a new era of scientific discovery:Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundationalevolving agent frameworkengineered specifically forAgentic Scienceat Scale. Driven by the core principle of continuousself-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring humanscientific inquiry. Crucially, as adomain-agnostic baseharness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity’s Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation ofautonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.

View arXiv pageView PDFGitHub116Add to collection

Get this paper in your agent:

hf papers read 2604\.17406

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.17406 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.17406 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.17406 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

Papers with Code Trending

EvoScientist is an adaptive multi-agent framework for end-to-end scientific discovery that continuously improves through persistent memory modules, comprising three specialized agents for idea generation, experiment execution, and knowledge distillation. It outperforms 7 state-of-the-art systems in scientific idea generation and improves code execution success rates through multi-agent evolution.

EvoMap/evolver

GitHub Trending (daily)

Evolver is a GEP-powered self-evolution engine for AI agents that automates prompt optimization and creates auditable, reusable evolution assets. The project is transitioning from fully open source to source-available while maintaining backward compatibility with existing MIT and GPL-3.0 releases.

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

arXiv cs.CL

EvoTest introduces J-TTL, a benchmark for measuring agent test-time learning capabilities, and proposes an evolutionary framework where an Actor Agent plays games while an Evolver Agent iteratively improves the system's prompts, memory, and hyperparameters without fine-tuning. The method demonstrates superior performance compared to reflection and memory-based baselines on complex text-based games.