Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Summary
Arbor is an AI framework for autonomous scientific research that uses a coordinator, executors, and a persistent hypothesis tree to iteratively improve research outcomes across multiple domains, achieving strong results on six real research tasks.
View Cached Full Text
Cached at: 06/11/26, 01:39 PM
Paper page - Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Source: https://huggingface.co/papers/2606.11926 Published on Jun 10
#2 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
An AI framework called Arbor enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains.
Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously overlong horizons. We introduce Arbor, a general framework forautonomous researchthat combines a long-livedcoordinator, short-livedexecutors, andHypothesis Tree Refinement(HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. Thecoordinatormanages global research strategy over the tree, whileexecutorsimplement and test individual hypotheses in isolatedworktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turnsautonomous researchfrom a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initialresearch artifactthroughiterative experimentationwithout step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the bestheld-out resulton all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. OnMLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.
View arXiv pageView PDFProject pageGitHub63Add to collection
Get this paper in your agent:
hf papers read 2606\.11926
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.11926 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.11926 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.11926 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@_akhaliq: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
This paper proposes a method for autonomous research agents using hypothesis-tree refinement to generate and test hypotheses, aiming toward generalist scientific discovery.
@HuggingPapers: Microsoft Research introduces Arbor A generalist autonomous research agent that uses persistent hypothesis-tree refinem…
Microsoft Research introduces Arbor, a generalist autonomous research agent that uses persistent hypothesis-tree refinement for cumulative learning, outperforming Codex and Claude Code across six research tasks and achieving 86% Any-Medal on MLE-Bench Lite.
@_akhaliq: paper:
A paper introducing Arbor, an AI framework that enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains.
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
AutoResearchClaw is a multi-agent autonomous research system that improves scientific discovery through structured debate, self-healing execution, and human collaboration, outperforming previous systems on the ARC-Bench benchmark by 54.7%.
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
ARIS is an open-source research harness that uses cross-model adversarial collaboration to ensure reliable long-term research outcomes through coordinated execution, orchestration, and assurance layers.