WebHarbor packages 15 real websites (Amazon, GitHub, BBC, etc.) as self-contained Flask+SQLite apps in a single Docker image with sub-second reset, designed for reproducible web agent evaluation and training. The project invites community contributions to expand to 100+ sites, with co-authorship opportunities.
Hello! Excited to share our latest community-driven research project: [**WebHarbor: Docking Real Websites for Evolving GUI Agent Environments**](https://aiming-lab.github.io/webharbor.github.io)! **TL;DR**: 15 popular websites (Amazon, GitHub, BBC News, arXiv, Booking, Hugging Face, etc.) packaged as self-contained Flask + SQLite apps in a single Docker image, with a control plane that resets each site to byte-identical state in <1 second, all by human-in-the-loop coding agent (e.g., Claude Code or CodeX). We support all 643 WebVoyager tasks out of the box. **Call for contribution**: Our Next goal is 100+ popular websites — covering all of Online-Mind2Web (147 sites) and beyond. Two tracks: * Contribute a new mirror site (use the coding-agent pipeline → human verify → open PR) → co-author on the final paper * Review submitted PRs (5 reviews → co-author) We also released useful skills for you(your coding agent) to work on it! Typically you can create a new mirron within 1 day! See more contribution details at [Contribute Guide](https://aiming-lab.github.io/webharbor.github.io/#contribute). **Why WebHarbor:** running web agent benchmarks on the live web is a nightmare — reCAPTCHA, geo-blocks, content drift, network flakiness, and tasks that go stale within months. Plus you can't reset the live web, which rules out heavy RL training. **You will need a lightweight, easy-to-reset, task-driven evolving environments for web agent, both evaluation and training!** **Related Resources:** |Name|Link| |:-|:-| |🏠 WebHarbor Project Page|[WebHarbor](https://aiming-lab.github.io/webharbor.github.io/)| |🤗 HuggingFace Dataset|[ChilleD/WebHarbor](https://huggingface.co/datasets/ChilleD/WebHarbor)| |💻 WebHarbor GitHub|[Code Repo](https://github.com/aiming-lab/WebHarbor)| |📊 Contribution Guide|[Guide Details](https://aiming-lab.github.io/webharbor.github.io/#contribute)| |📝 Contribution Request Form|[Google Form](https://forms.gle/ngcD1rzAfUEphNmRA)| Welcome suggestions and discussions!
Apple Research introduces Weblica, a framework for creating scalable and reproducible training environments for visual web agents using HTTP caching and LLM-based synthesis.
A web application that builds containers entirely in the browser using client-side code, demonstrating the power of custom container tooling. Users can pick a base image, run a shell script, and export the resulting image as a tar file.
sandboxed is an open-source engine that turns a single Linux machine into a fleet of isolated dev sandboxes with coding agents and live preview URLs, self-hosted and easy to install.
Runtime is a platform that provides sandboxed coding agents with company context, integrations, and guardrails, allowing every team member to automate tasks and ship work using agents. It includes custom environments, specialized agents, observability, and supports various integrations and deployment options.