Tag
A new tool enabling AI agents to browse the web using a real Chrome instance with live DOM access, MCP tools, and multi-tab control.
Codex has been updated to allow driving Chrome tabs in the background, enabling automated web tasks without active user supervision.
OpenAI launches Deep Research, an agentic capability powered by an early version of o3 that conducts multi-step internet research for complex tasks, with comprehensive safety testing and privacy protections implemented before rollout to Pro users.
OpenAI introduced the Computer-Using Agent (CUA), a model combining GPT-4o's vision with reinforcement learning to interact with GUIs like a human, powering the new Operator agent. CUA sets new state-of-the-art benchmarks including 38.1% on OSWorld and 58.1% on WebArena, and is available as a research preview for ChatGPT Pro users in the US.
OpenDevin is an open-source platform for developing AI agents that can write code, use command lines, and browse the web to interact with the environment. It supports multiple agents, sandboxed code execution, and evaluation benchmarks like SWE-Bench.
OpenAI fine-tuned GPT-3 to answer open-ended questions more accurately by enabling it to use a text-based web browser to search, retrieve, and cite sources. The model outperforms human demonstrators 56% of the time on questions from ELI5 dataset but shows limitations on out-of-distribution tasks like TruthfulQA.