Tag
China released an open-source desktop automation agent that runs 100% locally, capable of controlling desktop apps, files, and browsing without internet.
ByteDance open-sources UI-TARS Desktop, a 100% local desktop automation tool that operates purely on pixels with no API calls, resolving the two major pain points of data privacy and API costs, providing an efficient open-source solution for building private automation workflows.
Microsoft launches Fara-7B, an efficient Computer Use Agent with only 7B parameters, surpassing larger models on web tasks, supporting pure local deployment, and achieving low-cost desktop automation.
ProCUA-SFT is a large-scale synthetic dataset of 3.1M step-level SFT samples for training computer-use agents, produced via an automated pipeline using a single VLM (Kimi-K2.5). Fine-tuning UI-TARS 7B on it achieves 45.0% on OSWorld, an 18.7 point improvement over the base model.
Microsoft has released Fara-7B, a small 7B-parameter language model focused on pure local desktop automation. It can directly take over your mouse and keyboard to execute repetitive workflows, with low cost and no need for internet connectivity.
Minicor is a Y Combinator-backed platform that deploys self-healing AI agents for scalable desktop automations, enabling integration with legacy systems lacking APIs.
Midscene's Computer Agent enables desktop UI automation to run headless in Linux CI, automated via xvfb-run, without needing a real machine or VM, and supports Electron, Qt, and GTK applications.
Atomic-Agent is a desktop operation Agent designed for llama.cpp local inference models, optimizing the runtime architecture to enable small local models to reliably execute multi-step desktop tasks.
IrisGo, backed by Andrew Ng, launches an AI desktop companion that learns user workflows and automates repetitive tasks on-device for privacy, targeting knowledge workers.
The author discusses building a small VLM for desktop GUI automation to move data between apps without APIs, expressing interest in non-coding autonomous use cases for local models.
OpenComputer presents a framework for creating verifiable software environments for computer-use agents, integrating state verifiers, self-improving verification layers, task synthesis, and evaluation systems across 33 desktop applications. Experiments show its verifiers align better with human judgment than LLM-as-judge, and frontier agents struggle with end-to-end completion.
OpenAI is developing a feature for Codex to control macOS applications via Computer Use even when the laptop is locked or asleep, and to remotely operate other desktop devices running the Codex app, extending its remote control capabilities.
A user describes a CLI tool that controls the entire desktop via hybrid mouse, keyboard, and screenshot methods, successfully performing tasks like sending email screenshots and remote desktop control. They seek challenging tests to validate its robustness.
MountainDesk is a local-first tool that bridges AI model inference with desktop automation, offering features like system state anchors, multi-agent orchestration, and background monitoring. The creator seeks feedback on security and workflow integration.
Teknium introduces an early preview of Computer Use built into the Hermes Agent and powered by TryCua, enabling any AI model to interact with and control a desktop environment in the background without overriding direct user input.
将 Hermes Agent 与 AionUI 结合,可将个人电脑升级为支持多智能体并行、具备长期记忆与自我进化能力的 Agentic AI 操作系统,实现从数据分析、文件管理到代码编写的全自动化本地工作流。
The article introduces Opendesk, an open-source tool that enhances the reliability of computer-use agents by leveraging native accessibility APIs to identify interactive elements, replacing error-prone pixel-coordinate guessing.
ByteDance's open-source desktop AI automation tool, UI-TARS Desktop, supports local execution and screen visual understanding. It can autonomously control your computer to handle daily tasks through natural language commands.
UI-TARS-desktop is a highly popular open-source tool by ByteDance that enables 100% local multimodal desktop automation, allowing users to control apps and browsers via natural language without cloud data leaks.
China has open-sourced a desktop AI agent that can see the screen and control mouse/keyboard via natural language, running entirely locally without cloud dependency.