ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Summary
ClawGUI is an open-source framework for training, evaluating, and deploying GUI agents using reinforcement learning, featuring standardized benchmarks and cross-platform deployment to Android, iOS, and HarmonyOS.
View Cached Full Text
Cached at: 05/08/26, 09:06 AM
Paper page - ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Source: https://huggingface.co/papers/2604.11784
Abstract
ClawGUI presents an open-source framework that addresses key challenges in GUI agent development through unified reinforcement learning, standardized evaluation, and cross-platform deployment capabilities.
GUI agentsdrive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers fromenvironment instabilityandclosed pipelines,evaluation protocolsdrift silently across works, and trained agents rarely reach real users on real devices. We present ClawGUI, an open-source framework addressing these three gaps within a single harness. ClawGUI-RL provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. ClawGUI-Eval enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8\% reproduction against official baselines. ClawGUI-Agent brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms withhybrid CLI-GUI controland persistent personalized memory. Trained end to end within this pipeline, ClawGUI-2B achieves 17.1\%Success Rateon MobileWorld GUI-Only, outperforming the same-scale MAI-UI-2B baseline by 6.0\%.
View arXiv pageView PDFProject pageGitHub1.12kAdd to collection
Get this paper in your agent:
hf papers read 2604\.11784
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.11784 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.11784 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.11784 in a Space README.md to link it from this page.
Collections including this paper6
Similar Articles
VisualClaw: A Real-Time, Personalized Agent for the Physical World
VisualClaw is a self-evolving multimodal agent that reduces deployment costs through hybrid encoding and skill evolution, while improving video-QA accuracy across multiple benchmarks.
OpenClaw has outgrown chat, hear me out
The author discusses the limitations of managing AI agent workflows via chat interfaces like Telegram with OpenClaw, advocating for dedicated dashboards and standardized UIs. They highlight emerging tools like Paperclip and Multica that aim to solve agent management issues.
I built a multi-agent platform on top of OpenClaw — 72 specialized agents, each with their own domain, all connected through ClawSwarm
A user built AI Pair, an open-source coordination layer on top of OpenClaw, enabling 72 specialized agents to discover, register, and collaborate on complex tasks across domains.
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
ClawEnvKit is an automated pipeline that generates diverse, verified environments for claw-like agents from natural language descriptions, enabling the construction of Auto-ClawEval, a large-scale benchmark with 1,040 environments at 13,800x lower cost than human curation. The system supports continuous, on-demand evaluation and adaptive training environment generation across multiple model families and agent frameworks.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents
ClawForge is a generator-backed benchmark framework for executable command-line workflows under state conflict, evaluating LLM agents on tasks with pre-existing partial, stale, or conflicting artifacts across 17 scenarios.