Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Hugging Face Daily Papers Papers

Summary

Arbor is an AI framework for autonomous scientific research that uses a coordinator, executors, and a persistent hypothesis tree to iteratively improve research outcomes across multiple domains, achieving strong results on six real research tasks.

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduce Arbor, a general framework for autonomous research that combines a long-lived coordinator, short-lived executors, and Hypothesis Tree Refinement (HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. The coordinator manages global research strategy over the tree, while executors implement and test individual hypotheses in isolated worktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turns autonomous research from a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initial research artifact through iterative experimentation without step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the best held-out result on all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. On MLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.
Original Article
View Cached Full Text

Cached at: 06/11/26, 01:39 PM

Paper page - Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Source: https://huggingface.co/papers/2606.11926 Published on Jun 10

#2 Paper of the day Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

An AI framework called Arbor enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains.

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously overlong horizons. We introduce Arbor, a general framework forautonomous researchthat combines a long-livedcoordinator, short-livedexecutors, andHypothesis Tree Refinement(HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. Thecoordinatormanages global research strategy over the tree, whileexecutorsimplement and test individual hypotheses in isolatedworktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turnsautonomous researchfrom a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initialresearch artifactthroughiterative experimentationwithout step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the bestheld-out resulton all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. OnMLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.

View arXiv pageView PDFProject pageGitHub63Add to collection

Get this paper in your agent:

hf papers read 2606\.11926

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.11926 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.11926 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.11926 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

@_akhaliq: paper:

X AI KOLs Following

A paper introducing Arbor, an AI framework that enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains.