Tag
OpenGUI is highlighted as a novel AI agent platform that utilizes actual Android devices for task execution, offering a more realistic interface than traditional browser-based agents.
ByteDance has open-sourced UI-TARS, an AI model capable of directly controlling computer interfaces via mouse and keyboard for tasks like booking flights or configuring software. Available in 2B, 7B, and 72B parameter sizes, it runs locally and offers a free alternative to paid services like Anthropic's Computer Use.
The author introduces agent-ctrl, an open-source Rust-based CLI tool for OS automation that allows AI agents to interact with native application UIs via accessibility trees.
OpenGUI is an open-source AI phone control system that lets AI autonomously operate real Android devices to carry out long-running mobile tasks such as social media management and research. It supports remote task dispatching via Lark, Telegram, Discord, or REST API. Its underlying architecture is split into two layers — a Plan Supervisor and an Executor Graph — and supports multiple models including Claude, Qwen, and Doubao.
Roundtable Space is a fully local, open-source desktop automation agent that uses natural language to control screens, mice, and keyboards across applications, rapidly accumulating over 29k GitHub stars.
Ara is a new agentic AI product that functions as a computer-use agent integrated into the user interface.
A developer demonstrates an AI agent autonomously modifying its own browser automation tools to handle edge cases in the Google Slides interface.
An OpenAI research preview explores learning from how people interact with their computers beyond chat, accompanied by a new arxiv paper on the topic.
A developer reverse-engineered OpenAI's Codex Computer Use to build pi-computer-use, an open-source, model-agnostic macOS automation tool featuring ax-first navigation and vision fallback for supported models.
Sam Altman announces major improvements to Codex, highlighting a new computer use capability that allows the model to control Mac applications in parallel without interfering with user workflows.
OpenAI releases a major update to Codex, enabling it to operate computers via cursor control, generate images, manage long-term tasks with memory, and deeply integrate with developer workflows like SSH and PR reviews.
HCompany has launched HoloTab, a Chrome extension powered by the Holo3 computer-use AI model, designed to automate web tasks and create reusable routines for users without technical skills.
OpenAI is releasing GPT-5.4 and GPT-5.4 Pro across ChatGPT, the API, and Codex, featuring native computer-use capabilities, 1M token context, improved reasoning and coding, and state-of-the-art performance on professional knowledge work benchmarks. It is described as OpenAI's most capable and token-efficient reasoning model to date.
Google releases Gemini 2.5 Computer Use model via the Gemini API, enabling developers to build AI agents that can interact with user interfaces through clicking, typing, and scrolling. The model outperforms alternatives on web and mobile control benchmarks with lower latency and is available in preview on Google AI Studio and Vertex AI.
OpenAI introduced the Computer-Using Agent (CUA), a model combining GPT-4o's vision with reinforcement learning to interact with GUIs like a human, powering the new Operator agent. CUA sets new state-of-the-art benchmarks including 38.1% on OSWorld and 58.1% on WebArena, and is available as a research preview for ChatGPT Pro users in the US.
Anthropic launched 74 updates in 52 days including Computer Use, Projects, and Claude Code Auto Mode, while Google countered with Gemini 3.1 Flash Live, vibe-coded browser demos, and Lyria 3 Pro music tools, as GenSpark enters with $20/month unlimited AI through 2026.