Tag
Apify and Coinbase launch x402 support, enabling AI agents to autonomously sign up, pay with USDC, and access over 20,000 web automation tools without human intervention.
The x402 protocol enables autonomous agents to pay for tools via USDC on Base, and Apify has expanded its x402 support from 2,000 to 20,000+ tools in partnership with Coinbase.
Apify and Coinbase have expanded x402 support to over 20,000 tools, enabling AI agents to autonomously pay per run using USDC on Base without requiring manual account setup or API keys.
An open-source agent that uses an LLM to control a real browser to fill forms and extract structured data, requiring minimal tokens per page.
BrowserAct is a web browser automation tool designed for AI agents, enabling automated web interactions.
A new paper introduces HLL, a benchmark for testing AI agents on real CAPTCHA tasks, showing that even strong agents fail on cluttered pages and recovery from mistakes.
Introduces SKILL.nb, a framework for governing reusable agent workflows through evidence-calibrated lifecycle policies, featuring selective formalization and gate-conditioned execution. It achieves significant improvements on web automation benchmarks and demonstrates resilience to environment drift.
This paper proposes SGDR (State-Grounded Dynamic Retrieval), an online skill learning method for web agents that enables stepwise, state-aware skill reuse rather than static task-level retrieval. Experiments on WebArena show SGDR achieves 37.5% success rate with GPT-4.1, a ~10.6% relative gain over strong baselines.
A comparison of Midscene and Browser-Use, two open-source tools with different focuses: Browser-Use is a web agent for one-time tasks, while Midscene is a vision SDK designed for reliable multi-platform repeated execution.
The author argues that AI agents fail on real web tasks not because models are weak, but because browsers are designed for humans and lack isolated, scriptable workspaces for agent use.
A fork of Playwright that generates unique browser fingerprints per session to enable AI agents to navigate the web undetected. The project is fully open-source under MIT license.
Browse.sh is a tool that gives AI agents muscle memory for automating web tasks.
Mini Browser is an agent-first browser CLI that enables AI agents to control a browser via Unix-style commands for navigation, scraping, screenshots, form filling, and more.
The author argues that the hidden cost of unreliable AI agents lies in the cognitive overhead of constant human monitoring, emphasizing that predictability and environmental stability matter more than raw intelligence for real-world deployment. Practical workflows improve significantly when agents operate within controlled, validated environments rather than unpredictable ones.
OpenAI has released a new Chrome extension for Codex that enables the AI to handle browser-based tasks such as debugging flows, checking dashboards, conducting research, and updating CRMs directly within the browser environment.
A tip for automating web tasks on sites that lack APIs using AI coding agents such as Claude Code, Cursor, OpenCode, and OpenClaw.