gui-automation

#gui-automation

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

arXiv cs.AI ↗ · 3d ago Cached

MIRAGE is a framework for mobile GUI agents that replaces verbose chain-of-thought reasoning with compact continuous latent representations, incorporating a generative world model perspective to predict future screen states before acting. On AndroidWorld and AndroidControl benchmarks, it achieves competitive or superior performance while reducing generated tokens by over 75%.

0 favorites 0 likes

#gui-automation

PRO-CUA: Process-Reward Optimization for Computer Use Agents

arXiv cs.AI ↗ · 2026-05-29 Cached

This paper introduces PRO-CUA, a process-reward optimization framework for training Computer Use Agents (CUAs) using iterative step-level reinforcement learning. The method decouples on-policy environment interaction from policy optimization, enabling dense credit assignment without relying on expert trajectories, and demonstrates effectiveness on live web benchmarks.

0 favorites 0 likes

#gui-automation

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

arXiv cs.AI ↗ · 2026-05-22 Cached

AutoRPA is a framework that automatically distills the decision logic of ReAct-style LLM agents into robust, token-efficient RPA functions for repetitive GUI tasks, reducing token usage by 82-96%.

0 favorites 0 likes

#gui-automation

AI agents should use real apps.

Reddit r/openclaw ↗ · 2026-05-21

OpenGUI is a tool that allows AI agents to directly operate real Android apps by reading the screen and interacting naturally, rather than relying on APIs or scripts.

0 favorites 0 likes

#gui-automation

what non-coding tasks have you gotten a local model to do autonomously?

Reddit r/LocalLLaMA ↗ · 2026-05-19

The author discusses building a small VLM for desktop GUI automation to move data between apps without APIs, expressing interest in non-coding autonomous use cases for local models.

0 favorites 0 likes

#gui-automation

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

ToolCUA is a new agent framework that optimizes GUI-tool path selection for computer use agents through staged training and reinforcement learning. It achieves state-of-the-art performance on OSWorld-MCP by effectively interleaving GUI actions and high-level tool calls.

0 favorites 0 likes

#gui-automation

@berryxia: Bros! Don't reinvent the wheel—just use this 31.4K-star open-source project! ByteDance has open-sourced UI-TARS-desktop. Taking a quick look, the project has been live for nearly a year! It currently has 31.4k stars, and its growth rate is quite steady. 24-hour growth...

X AI KOLs Timeline ↗ · 2026-05-10 Cached

ByteDance open-sourced UI-TARS-desktop, a native desktop GUI agent with 31.4k GitHub stars that uses vision models to control local or remote applications via natural language. The tool runs locally for privacy, supports Windows and macOS, and includes a CLI with streaming output for developers.

0 favorites 0 likes

#gui-automation

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Papers with Code Trending ↗ · 2025-04-01 Cached

Agent S2 is a new compositional framework for computer use agents that achieves state-of-the-art performance on multiple benchmarks by utilizing Mixture-of-Grounding and Proactive Hierarchical Planning.

0 favorites 0 likes

#gui-automation

Computer-Using Agent

OpenAI Blog ↗ · 2025-01-23 Cached

OpenAI introduced the Computer-Using Agent (CUA), a model combining GPT-4o's vision with reinforcement learning to interact with GUIs like a human, powering the new Operator agent. CUA sets new state-of-the-art benchmarks including 38.1% on OSWorld and 58.1% on WebArena, and is available as a research preview for ChatGPT Pro users in the US.

0 favorites 0 likes

gui-automation

Submit Feedback