@_akhaliq: paper:
Summary
A paper introducing Arbor, an AI framework that enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains.
View Cached Full Text
Cached at: 06/11/26, 07:41 PM
paper: https://t.co/cTacdxfPBa
Paper page - Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Source: https://huggingface.co/papers/2606.11926 Published on Jun 10
#2 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
An AI framework called Arbor enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains.
Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously overlong horizons. We introduce Arbor, a general framework forautonomous researchthat combines a long-livedcoordinator, short-livedexecutors, andHypothesis Tree Refinement(HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. Thecoordinatormanages global research strategy over the tree, whileexecutorsimplement and test individual hypotheses in isolatedworktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turnsautonomous researchfrom a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initialresearch artifactthroughiterative experimentationwithout step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the bestheld-out resulton all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. OnMLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.
View arXiv pageView PDFProject pageGitHub63Add to collection
Get this paper in your agent:
hf papers read 2606\.11926
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.11926 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.11926 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.11926 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
@HuggingPapers: Microsoft Research introduces Arbor A generalist autonomous research agent that uses persistent hypothesis-tree refinem…
Microsoft Research introduces Arbor, a generalist autonomous research agent that uses persistent hypothesis-tree refinement for cumulative learning, outperforming Codex and Claude Code across six research tasks and achieving 86% Any-Medal on MLE-Bench Lite.
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Arbor is an AI framework for autonomous scientific research that uses a coordinator, executors, and a persistent hypothesis tree to iteratively improve research outcomes across multiple domains, achieving strong results on six real research tasks.
@_ar9av: day 6 of reading one arxiv paper around AI every day and sharing what actually stuck: AutoSci (Peking University) tldr:…
A tweet sharing AutoSci, a system from Peking University that automates the entire research lifecycle from literature review to rebuttal, with self-improvement between projects.
@rohanpaul_ai: New Meta, Stanford, Google and many other top labs paper proposes AutoResearchClaw. Shows that automated research impro…
A new paper from Meta, Stanford, and Google introduces AutoResearchClaw, which improves automated research by integrating failure recovery, debate, and selective human input. It outperforms AI Scientist v2 by 54.7% on ARC-Bench and reveals that autonomy is enhanced when constrained by process rather than given unlimited freedom.
@_akhaliq: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
This paper proposes a method for autonomous research agents using hypothesis-tree refinement to generate and test hypotheses, aiming toward generalist scientific discovery.