Multi-Agent Computer Use

Hugging Face Daily Papers 06/01/26, 12:00 AM Papers

multi-agent computer-use task-decomposition parallel-execution web-navigation benchmark

Summary

This paper proposes a multi-agent computer use (MACU) system that uses a manager model to decompose tasks into directed acyclic graphs for parallel execution by subagents. It demonstrates consistent improvements over single-agent baselines on multiple benchmarks and better test-time scaling.

Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and consistent re-planning based on new information. In this paper, we argue that we should instead move towards evaluating and building multi-agent computer use (MACU) systems. These systems, which emphasize planning and parallel execution, alleviate many of the shortcomings of single-agent CUAs. We propose a general multi-agent setup in which a manager model decomposes computer use tasks as a directed acyclic graph (DAG), encoding relevant dependencies and goals for subagents. At each iteration, the manager dispatches parallel CUA subagents to carry out nodes on the ready frontier of the DAG, and continuously revises the DAG (adding, canceling, or rewriting nodes) as new findings arrive from subagents. This design treats the partially observable environment of computer use as a first class challenge: information that downstream agents may not be able to re-observe are retained and passed forward through the manager and DAG structure. We demonstrate that MACU consistently improves over strong single-agent baselines by 3.4-25.5% on desktop (OSWorld) and web navigation (Online-Mind2Web, WebTailBench, Odysseys) benchmarks, exhibits more favorable test-time scaling, and solves complex long-horizon tasks where single-agent CUAs get stuck. On Odysseys, a long-horizon web navigation benchmark, MACU improves average task completion wall-clock time by {sim} 1.5 times, demonstrating its efficacy in speeding up traditionally slow CUA pipelines. Our findings highlight that multi-agent coordination is a promising axis for scaling computer use agents to work productively for longer and more effectively. We release all code and interactive visualizations at https://jykoh.com/multi-agent-computer-use.

Original Article

View Cached Full Text

Cached at: 06/02/26, 03:37 PM

Paper page - Multi-Agent Computer Use

Source: https://huggingface.co/papers/2606.01533

Abstract

Multi-agent computer use systems outperform single-agent approaches on complex tasks by enabling parallel execution and dynamic task decomposition through directed acyclic graphs.

Computer use agents(CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit fromtask decomposition,parallel execution, and consistent re-planning based on new information. In this paper, we argue that we should instead move towards evaluating and building multi-agent computer use (MACU) systems. These systems, which emphasize planning andparallel execution, alleviate many of the shortcomings of single-agent CUAs. We propose a general multi-agent setup in which a manager model decomposes computer use tasks as adirected acyclic graph(DAG), encoding relevant dependencies and goals for subagents. At each iteration, the manager dispatches parallel CUA subagents to carry out nodes on the ready frontier of the DAG, and continuously revises the DAG (adding, canceling, or rewriting nodes) as new findings arrive from subagents. This design treats the partially observable environment of computer use as a first class challenge: information that downstream agents may not be able to re-observe are retained and passed forward through the manager and DAG structure. We demonstrate that MACU consistently improves over strong single-agent baselines by 3.4-25.5% on desktop (OSWorld) and web navigation (Online-Mind2Web, WebTailBench, Odysseys) benchmarks, exhibits more favorable test-time scaling, and solves complex long-horizon tasks where single-agent CUAs get stuck. On Odysseys, a long-horizon web navigation benchmark, MACU improves average task completion wall-clock time by {sim} 1.5 times, demonstrating its efficacy in speeding up traditionally slow CUA pipelines. Our findings highlight that multi-agent coordinationis a promising axis for scalingcomputer use agentsto work productively for longer and more effectively. We release all code and interactive visualizations at https://jykoh.com/multi-agent-computer-use.

View arXiv page View PDF Project page Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.01533 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.01533 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.01533 in a Space README.md to link it from this page.

Multi-Agent Computer Use

Paper page - Multi-Agent Computer Use

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

Recursive Multi-Agent Systems

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Simple Multi-Agent Architecture Running Across Our Entire Org. Keeping everything in Loop.

How we built our multi-agent research system

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

Submit Feedback

Similar Articles

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Simple Multi-Agent Architecture Running Across Our Entire Org. Keeping everything in Loop.

How we built our multi-agent research system

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability