Computer Use in Codex

YouTube AI Channels Products

Summary

OpenAI demonstrates the 'Computer Use' feature in Codex, allowing the AI to directly interact with local GUI applications on macOS using an accessibility framework and the fast Spark model for non-blocking, high-speed automation.

No content available
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/13/26, 06:51 AM

TL;DR: OpenAI demonstrated Codex's "Computer Use" feature, which allows AI to directly manipulate local GUI applications. It supports parallel multitasking, non-blocking background execution, and achieves ultra-fast automation through an accessibility framework and the Spark model. # From Coding Agent to All-Purpose Team Member: A Detailed Look at Codex's Computer Use Feature Codex has rapidly evolved from a simple coding agent into a true team member. The core of this transformation lies in the "Computer Use" feature, which expands Codex's capabilities from files and code tools to actual user workflows within local applications. In this video, OpenAI's Roma and product expert Ari demonstrate how the feature works, its multitasking capabilities, architectural improvements, and security mechanisms. ## Intuitive Setup and Permission Management For new users, Codex's "Computer Use" feature provides an intuitive onboarding interface. * **Permission Request**: On first use, the system pops up a window requesting permission to "Enable Codex Computer Use." * **Guided Setup**: After clicking "Allow," the panel animates into a setup window, guiding users through necessary system authorization steps. * **Minimal Interaction**: The entire process can be configured with simple drag-and-drop operations, after which the agent automatically clicks and executes tasks. ## Practical Application: Automating Tedious Tasks To demonstrate the feature's utility, Ari showcased a typical time-consuming scenario: testing software on an older Mac OS. This usually involves creating new instances in a virtual machine (such as UTM), requiring numerous clicks and navigating through the macOS setup assistant. With Codex, users only need to input a natural language instruction: "Create a new Mac virtual machine in UTM." 1. **App Identification**: The user types the `@` symbol to bring up the list of installed apps and selects UTM. 2. **Automatic Execution**: Codex launches UTM, automatically completing the macOS image download and system setup process. 3. **Efficiency Gain**: The complex setup process, which previously required manual effort, is fully automated, saving significant time. ## Core Advantage: Non-Blocking Multitasking Unlike many other "Computer Use" implementations that completely take over the user's computer, Codex is designed to allow users to continue using their computer while the agent operates. ### Independent Cursor and Parallel Work * **Independent Cursor**: Codex has a cursor independent of the user's. When the agent operates apps in the background, it does not interfere with the user's current actions. * **Multi-App Concurrency**: Codex can operate multiple applications simultaneously. In the demo, Ari initiated three tasks at once: 1. Setting up a virtual machine in UTM. 2. Playing work-appropriate music in Spotify. 3. Adding a reminder in the Reminders app: "Check my tax documents tonight." This multitasking capability turns the Mac into an efficient automation environment, where the agent handles tedious tasks in the background, allowing users to focus on core work. ### Natural and Intuitive Interaction Experience To enhance user experience, the development team carefully designed the cursor's movement curves. The cursor moves naturally, even appearing "playful," with the arrow rotating as it moves, as if "swimming" on the screen. This design not only adds interest but also helps users intuitively understand the agent's specific actions within each application. ## Technical Breakthrough: Accessibility Framework and Spark Model Codex's "Computer Use" feature has undergone significant optimization at the底层 technical level, combining multimodal capabilities with an Accessibility framework to significantly improve accuracy and speed. ### Understanding Beyond Screenshots Traditional "Computer Use" features rely heavily on screenshots, using multimodal models to recognize interfaces and click via coordinates. Codex introduces deeper technology: * **Accessibility Framework**: By extracting hidden text information from the app interface, the model deeply understands the role of every element on the screen. * **Expanded Field of View**: Even if content scrolls off-screen, the model can perceive its existence through text descriptions, maintaining high accuracy when executing tasks. ### Introducing the Spark Model for Superhuman Speed Because it no longer relies entirely on image processing, Codex can use non-multimodal models, such as **Codex Spark**. * **Ultra-Fast Response**: The Spark model is extremely fast, allowing "Computer Use" operations to exceed human speed. * **Real-Time Demo**: In a demo debugging an application, after switching to the Spark model, Codex opened a text editor, typed a message, and sent it in the background—all in about one second. This speed allows the agent to complete background tasks almost instantaneously. ## Security and Privacy Protection Given that this feature involves control over local applications, OpenAI places high importance on security to ensure users feel safe. * **App-Level Permission Isolation**: Codex can only access apps explicitly authorized by the user. * **First-Use Authorization**: Every time Codex attempts to use a new application for the first time, it requests user permission. * **Strict Limitations**: Once authorized, Codex can only view and input into that specific app; it cannot access or interact with other unauthorized apps. This means sensitive content (such as private browsing history or encrypted files) is protected and unreachable by Codex. This fine-grained permission control builds user trust, ensuring the agent accesses specific development or productivity tools only when necessary. ## Personal Workflow Integration and Future Outlook ### Real-User Case Roma shared her personal experience using the "Computer Use" feature: * **Financial Tracking**: She uses the Numbers app for financial tracking, now letting Codex automatically update spreadsheets without manual intervention. * **End-to-End Access**: Combined with file system access and online service plugins, "Computer Use" fills the final puzzle piece, enabling Codex to access local web apps and native apps end-to-end. ### Technical Roadmap Ari pointed out that earlier products like Operator and ChatGPT Agent trained specialized models for "Computer Use," but now these capabilities are integrated into the main GPT models and available via API. * **Performance Goals**: The future goal is for "Computer Use" to surpass human levels, achieving operation speeds 2x, 5x, or even 10x faster than humans. * **Indispensability**: When the speed is sufficient, this feature will become an indispensable part of daily life and workflows, saving users significant time. ## Current Availability and Platform Support * **macOS**: The "Computer Use" feature is currently available on Mac. * **Windows**: OpenAI stated they are working to bring this feature to Windows users as soon as possible. OpenAI encourages users to try this feature in real-world work scenarios, especially for complex tasks involving multiple app switches and long durations, to experience the efficiency gains. Source: Computer use in Codex - OpenAI (https://www.youtube.com/watch?v=D_FCYsshMI4)

Similar Articles

Codex for (almost) everything

YouTube AI Channels

OpenAI’s Codex gains Mac app control, tool integration, image generation, memory of user preferences, and the ability to handle ongoing repeatable tasks.

Codex for (almost) everything

OpenAI Blog

OpenAI releases a major update to Codex, enabling it to operate computers via cursor control, generate images, manage long-term tasks with memory, and deeply integrate with developer workflows like SSH and PR reviews.

Codex can now use Chrome directly on macOS and Windows.

YouTube AI Channels

OpenAI released a Chrome extension for Codex (called Cortex) that allows it to operate inside your real Chrome browser with cookies, sessions, and multiple tabs, enabling automated research, expense reporting, and multi-agent gaming.

Introducing the Codex app

OpenAI Blog

OpenAI introduces the Codex app for macOS (with Windows support added March 2026), a desktop interface for managing multiple coding agents in parallel, supervising long-running tasks, and collaborating on software development. The app features project-based threading, worktree support for conflict-free multi-agent work, skill extensions, and is available to ChatGPT Free/Go users with doubled rate limits on paid plans.