WebHarbor - We "dock" the real websites into local for web agents! [R]

Reddit r/MachineLearning 05/14/26, 02:48 AM Papers

web-agents benchmarking docker open-source gui-agents research-tool

Summary

WebHarbor packages 15 real websites (Amazon, GitHub, BBC, etc.) as self-contained Flask+SQLite apps in a single Docker image with sub-second reset, designed for reproducible web agent evaluation and training. The project invites community contributions to expand to 100+ sites, with co-authorship opportunities.

Hello! Excited to share our latest community-driven research project: [**WebHarbor: Docking Real Websites for Evolving GUI Agent Environments**](https://aiming-lab.github.io/webharbor.github.io)! **TL;DR**: 15 popular websites (Amazon, GitHub, BBC News, arXiv, Booking, Hugging Face, etc.) packaged as self-contained Flask + SQLite apps in a single Docker image, with a control plane that resets each site to byte-identical state in <1 second, all by human-in-the-loop coding agent (e.g., Claude Code or CodeX). We support all 643 WebVoyager tasks out of the box. **Call for contribution**: Our Next goal is 100+ popular websites — covering all of Online-Mind2Web (147 sites) and beyond. Two tracks: * Contribute a new mirror site (use the coding-agent pipeline → human verify → open PR) → co-author on the final paper * Review submitted PRs (5 reviews → co-author) We also released useful skills for you(your coding agent) to work on it! Typically you can create a new mirron within 1 day! See more contribution details at [Contribute Guide](https://aiming-lab.github.io/webharbor.github.io/#contribute). **Why WebHarbor:** running web agent benchmarks on the live web is a nightmare — reCAPTCHA, geo-blocks, content drift, network flakiness, and tasks that go stale within months. Plus you can't reset the live web, which rules out heavy RL training. **You will need a lightweight, easy-to-reset, task-driven evolving environments for web agent, both evaluation and training!** **Related Resources:** |Name|Link| |:-|:-| |🏠 WebHarbor Project Page|[WebHarbor](https://aiming-lab.github.io/webharbor.github.io/)| |🤗 HuggingFace Dataset|[ChilleD/WebHarbor](https://huggingface.co/datasets/ChilleD/WebHarbor)| |💻 WebHarbor GitHub|[Code Repo](https://github.com/aiming-lab/WebHarbor)| |📊 Contribution Guide|[Guide Details](https://aiming-lab.github.io/webharbor.github.io/#contribute)| |📝 Contribution Request Form|[Google Form](https://forms.gle/ngcD1rzAfUEphNmRA)| Welcome suggestions and discussions!

Original Article

WebHarbor - We "dock" the real websites into local for web agents! [R]

Similar Articles

Harbor

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

Fully in-browser container builds

Self-hosted dev sandboxes with preview URLs (Docker, Go, no K8s)

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team

Submit Feedback

Similar Articles

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

Fully in-browser container builds

Self-hosted dev sandboxes with preview URLs (Docker, Go, no K8s)

Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team