Tag
This paper proposes SGDR (State-Grounded Dynamic Retrieval), an online skill learning method for web agents that enables stepwise, state-aware skill reuse rather than static task-level retrieval. Experiments on WebArena show SGDR achieves 37.5% success rate with GPT-4.1, a ~10.6% relative gain over strong baselines.
A comparison of Midscene and Browser-Use, two open-source tools with different focuses: Browser-Use is a web agent for one-time tasks, while Midscene is a vision SDK designed for reliable multi-platform repeated execution.
The author argues that AI agents fail on real web tasks not because models are weak, but because browsers are designed for humans and lack isolated, scriptable workspaces for agent use.
A fork of Playwright that generates unique browser fingerprints per session to enable AI agents to navigate the web undetected. The project is fully open-source under MIT license.
Mini Browser is an agent-first browser CLI that enables AI agents to control a browser via Unix-style commands for navigation, scraping, screenshots, form filling, and more.
The author argues that the hidden cost of unreliable AI agents lies in the cognitive overhead of constant human monitoring, emphasizing that predictability and environmental stability matter more than raw intelligence for real-world deployment. Practical workflows improve significantly when agents operate within controlled, validated environments rather than unpredictable ones.
OpenAI has released a new Chrome extension for Codex that enables the AI to handle browser-based tasks such as debugging flows, checking dashboards, conducting research, and updating CRMs directly within the browser environment.
A tip for automating web tasks on sites that lack APIs using AI coding agents such as Claude Code, Cursor, OpenCode, and OpenClaw.