OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Hugging Face Daily Papers 05/19/26, 12:00 AM Papers

computer-use-agents verification state-verifiers desktop-automation ai-agents task-generation evaluation

Summary

OpenComputer presents a framework for creating verifiable software environments for computer-use agents, integrating state verifiers, self-improving verification layers, task synthesis, and evaluation systems across 33 desktop applications. Experiments show its verifiers align better with human judgment than LLM-as-judge, and frontier agents struggle with end-to-end completion.

We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specific state verifiers that expose structured inspection endpoints over real applications, (2) a self-evolving verification layer that improves verifier reliability using execution-grounded feedback, (3) a task-generation pipeline that synthesizes realistic and machine-checkable desktop tasks, and (4) an evaluation harness that records full trajectories and computes auditable partial-credit rewards. In its current form, OpenComputer covers 33 desktop applications and 1,000 finalized tasks spanning browsers, office tools, creative software, development environments, file managers, and communication applications. Experiments show that OpenComputer's hard-coded verifiers align more closely with human adjudication than LLM-as-judge evaluation, especially when success depends on fine-grained application state. Frontier agents struggle with end-to-end completion despite partial progress, and open-source models exhibit sharp drops from their OSWorld-Verified scores, exposing a persistent gap in robust computer automation.

Original Article

View Cached Full Text

Cached at: 05/20/26, 02:35 AM

Paper page - OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Source: https://huggingface.co/papers/2605.19769

Abstract

OpenComputer presents a framework for creating verifiable software environments for computer-use agents through integrated state verification, self-improving layers, task synthesis, and evaluation systems across multiple desktop applications.

We present OpenComputer, averifier-grounded frameworkfor constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specificstate verifiersthat expose structured inspection endpoints over real applications, (2) aself-evolving verification layerthat improves verifier reliability using execution-grounded feedback, (3) atask-generation pipelinethat synthesizes realistic and machine-checkable desktop tasks, and (4) anevaluation harnessthat records full trajectories and computes auditablepartial-credit rewards. In its current form, OpenComputer covers 33desktop applicationsand 1,000 finalized tasks spanning browsers, office tools, creative software, development environments, file managers, and communication applications. Experiments show that OpenComputer’s hard-coded verifiers align more closely withhuman adjudicationthanLLM-as-judge evaluation, especially when success depends on fine-grained application state. Frontier agents struggle with end-to-end completion despite partial progress, and open-source models exhibit sharp drops from their OSWorld-Verified scores, exposing a persistent gap in robustcomputer automation.

View arXiv page View PDF GitHub Add to collection

Get this paper in your agent:

hf papers read 2605\.19769

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.19769 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.19769 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.19769 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Paper page - OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Open Computer Use

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

On the Reliability of Computer Use Agents

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

Submit Feedback

Similar Articles

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

On the Reliability of Computer Use Agents

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability