Tag
This paper introduces ReVision, a method to reduce token usage in computer-use agents by removing redundant visual patches from consecutive screenshots. It demonstrates that this efficiency gain allows agents to process longer trajectories and improve performance on benchmarks like OSWorld.
ToolCUA is a new agent framework that optimizes GUI-tool path selection for computer use agents through staged training and reinforcement learning. It achieves state-of-the-art performance on OSWorld-MCP by effectively interleaving GUI actions and high-level tool calls.
This academic paper proposes a unified architecture-lifecycle framework for securing computer-use agents (CUAs) as they transition from benchmarks to real-world software environments. It analyzes reliability challenges across perception, decision, execution layers and creation, deployment, operation, maintenance stages.
A preprint analyzing why computer-use agents succeed once but fail on repeated executions, attributing unreliability to execution stochasticity, task ambiguity, and behavioral variability, and advocating repeated evaluation and stable strategies.
Agent S2 is a new compositional framework for computer use agents that achieves state-of-the-art performance on multiple benchmarks by utilizing Mixture-of-Grounding and Proactive Hierarchical Planning.
trycua/cua is an open-source toolkit and Python library for building, benchmarking, and deploying computer-use agents, featuring macOS background automation and cross-platform agent-ready sandboxes.