Addendum to OpenAI o3 and o4-mini system card: OpenAI o3 Operator
Summary
OpenAI announced an upgrade to Operator, replacing the GPT-4o-based Computer Using Agent model with an o3-based version that maintains safety guardrails while improving coding capabilities for web automation tasks.
View Cached Full Text
Cached at: 04/20/26, 02:48 PM
Similar Articles
OpenAI o3 and o4-mini System Card
OpenAI released system cards for o3 and o4-mini models, which feature advanced reasoning capabilities combined with tool integration (web browsing, Python, image analysis, etc.) and are evaluated under OpenAI's Preparedness Framework v2 for safety in biological, cybersecurity, and AI self-improvement domains.
OpenAI o3-mini System Card
OpenAI releases the o3-mini System Card, documenting safety evaluations and risk assessments for their advanced reasoning model trained with reinforcement learning. The model achieves state-of-the-art safety performance on certain benchmarks and is classified as Medium risk overall under OpenAI's Preparedness Framework.
Introducing OpenAI o3 and o4-mini
OpenAI releases o3 and o4-mini, its latest reasoning models that can agentically access and combine all ChatGPT tools (web search, code execution, image analysis, image generation). o3 achieves state-of-the-art performance on coding, math, and science benchmarks with 20% fewer major errors than o1, while o4-mini offers efficient reasoning optimized for cost and speed.
Operator System Card
OpenAI released the Operator System Card detailing safety evaluations for its Computer-Using Agent (CUA) model, which combines GPT-4o's vision capabilities with reinforcement learning to interact with GUIs and perform web-based tasks on users' behalf. The card outlines risk areas including prompt injections, harmful tasks, and model mistakes, along with multi-layered mitigations based on OpenAI's Preparedness Framework.
Addendum to o3 and o4-mini system card: Codex
OpenAI announces Codex, a cloud-based coding agent powered by codex-1 (o3 optimized for software engineering) that can perform coding tasks, run tests, and generate pull requests with verifiable evidence of actions.