Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Hugging Face Daily Papers 06/10/26, 12:00 AM Papers

Summary

Arbor is an AI framework for autonomous scientific research that uses a coordinator, executors, and a persistent hypothesis tree to iteratively improve research outcomes across multiple domains, achieving strong results on six real research tasks.

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduce Arbor, a general framework for autonomous research that combines a long-lived coordinator, short-lived executors, and Hypothesis Tree Refinement (HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. The coordinator manages global research strategy over the tree, while executors implement and test individual hypotheses in isolated worktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turns autonomous research from a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initial research artifact through iterative experimentation without step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the best held-out result on all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. On MLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:39 PM

Paper page - Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Source: https://huggingface.co/papers/2606.11926 Published on Jun 10

#2 Paper of the day Authors:

Abstract

An AI framework called Arbor enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains.

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously overlong horizons. We introduce Arbor, a general framework forautonomous researchthat combines a long-livedcoordinator, short-livedexecutors, andHypothesis Tree Refinement(HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. Thecoordinatormanages global research strategy over the tree, whileexecutorsimplement and test individual hypotheses in isolatedworktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turnsautonomous researchfrom a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initialresearch artifactthroughiterative experimentationwithout step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the bestheld-out resulton all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. OnMLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.

View arXiv page View PDF Project page GitHub63 Add to collection

Get this paper in your agent:

hf papers read 2606\.11926

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.11926 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.11926 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.11926 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Paper page - Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

@_akhaliq: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

@HuggingPapers: Microsoft Research introduces Arbor A generalist autonomous research agent that uses persistent hypothesis-tree refinem…

@_akhaliq: paper:

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Submit Feedback

Similar Articles

@_akhaliq: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

@HuggingPapers: Microsoft Research introduces Arbor A generalist autonomous research agent that uses persistent hypothesis-tree refinem…

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration