@GitHub_Daily: 让 AI Agent 自动化操作浏览器或抓数据,经常被各种反爬机制拦截,遇到验证码、人机验证直接卡死。 最近 BrowserAct 团队开源了一个 Skill,专为 AI Agent 设计的浏览器自动化命令行工具。 提供三层反封锁机制,从…
摘要
BrowserAct 团队开源了一个专为 AI Agent 设计的浏览器自动化命令行工具,提供三层反封锁机制(指纹伪装、验证码破解、人类接管),支持多浏览器并行、账户隔离,并优化了输出格式以节省Token。
查看缓存全文
缓存时间: 2026/06/05 07:10
让 AI Agent 自动化操作浏览器或抓数据,经常被各种反爬机制拦截,遇到验证码、人机验证直接卡死。
最近 BrowserAct 团队开源了一个 Skill,专为 AI Agent 设计的浏览器自动化命令行工具。
提供三层反封锁机制,从指纹伪装、验证码自动破解,到 AI 搞不定时生成一个链接让人类随时接管,整套流程衔接得很顺畅。
GitHub:http://github.com/browser-act/skills…
还可以多浏览器并行时,每个任务的 Cookie、指纹、代理完全隔离,网站没法关联不同账号。
另外输出格式也专门为大模型优化过,比传统的 HTML 或 JSON 省好几倍 Token。
还附带一个 Skill Forge 功能,让 AI 自动探索网站结构并生成可复用的抓取脚本,之后批量跑数据不用重新探索。
如果你在用 Claude Code、Cursor 这类工具做浏览器自动化经常被反爬,这个项目值得试试。
browser-act/skills
Source: https://github.com/browser-act/skills
Browser automation CLI built for AI agents. Get past anti-bot walls, hand off to humans across platforms when stuck, run parallel tasks without cross-contamination, and isolate multiple accounts in independent browsers.
Why BrowserAct
The browser an AI agent needs has to reach places standard tools can’t, let a human seamlessly take over when the agent is stuck, keep parallel tasks from cross-contaminating, and be designed for LLM reasoning — not human-written scripts. A browser for agents must get four things right.
1. Break through blocks — three progressive layers
- Environment layer — stealth fingerprint spoofing, TLS rotation, proxy switching. The vast majority of blocks never trigger.
- Execution layer —
solve-captchaauto-solves CAPTCHAs;stealth-extractpulls protected pages in one command. - Human layer —
remote-assistgenerates a live URL; the user takes over from any device, and the agent continues seamlessly when done.
2. Three browser modes — by real-world scenario
| Mode | Scenario | Key trait |
|---|---|---|
chrome | Reuse local Chrome login state | Profile import or CDP attach |
stealth privacy mode | Frictionless batch scraping without login | Fresh fingerprint per session + proxy rotation, zero residue |
stealth fixed identity | Logged-in accounts · multi-browser parallel | Stable fingerprint + stable IP, stable account identity, not flagged as bots |
3. Zero-interference concurrency — every agent in its own lane
- Cross-browser parallel — independent cookies, fingerprints, proxies. Sites cannot correlate them.
- Same-browser multi-session — shared login state, independent execution, tasks don’t block each other.
- Privacy mode — fresh fingerprint and empty profile per session, zero residue when done.
4. Designed for agent reasoning — not human scripts
- Compact text output — indexed text format, several times more token-efficient than JSON or HTML.
- Indexed interaction —
statereturns an indexed list;click 3/input 2 "...". No DOM parsing required. - Semantic memory — every browser carries a
desc, matched to tasks by meaning. - Concurrency-safe — session ownership + explicit naming. Multi-agent operation never conflicts.
Security: confirmation gating — sensitive operations (browser create / delete, Profile import, proxy changes, security and privacy toggles) require explicit user approval. Prior approvals do not carry over. Enforced at the Skill layer, not a configuration toggle.
And More
- Better headless — Default headless without disrupting users; stealth headless that isn’t detected.
- Cross-platform remote handoff — Any device opens the link to take over, and the agent continues seamlessly.
Install
Tell your AI agent:
Install browser-act. Skill source: https://github.com/browser-act/skills/tree/main/browser-act . Verify it works after installation.
Quick Start
# Extract protected page content (zero config)
browser-act stealth-extract https://example.com
# Full browser automation
browser-act --session my-task browser open <id> https://example.com
browser-act --session my-task state # See clickable elements
browser-act --session my-task click 3 # Click by index
browser-act --session my-task input 2 "hi" # Type into a field
The agent runs get-skills at the start of each session — gets environment state, browser list, and commands in one call:
browser-act get-skills core --skill-version 2.0.2
How agents discover and use BrowserAct →
Compatibility
OS: Windows, macOS, Linux
Agents: Claude Code · Cursor · VS Code · OpenCode · OpenClaw · Codex · Gemini CLI — works with any agent that can execute shell commands and load Skills.
Documentation
Full documentation covers anti-blocking, browser modes, sessions and concurrency, headless and remote handoff, agent design, the Skills system, and the complete command reference.
Also From BrowserAct
Skill Forge — Your Personal Scraping Engineer
Need to extract data from the same website repeatedly at scale? Don’t write scrapers by hand. Skill Forge explores a site once, discovers its APIs and data patterns, generates a deploy-ready Skill package, then runs reliably without re-exploration — 500 or 5,000 records through the same stable path.
Any website. Any data. One command to start:
Install browser-act-skill-forge. Skill source: https://github.com/browser-act/skills/tree/main/browser-act-skill-forge . Verify it works after installation.
Then tell your agent what you need:
“Forge a Skill that extracts job listings from LinkedIn — title, company, salary, URL. I’ll run 300 keywords later.”
Solutions Catalog
30+ pre-built Skills already generated by Skill Forge, ready to install and run. Covers Amazon, Google Maps, YouTube, Reddit, WeChat, Zhihu, and more.
Browse the full Solutions Catalog →
Build Your Own
Can’t find what you need above? Generate a custom Skill for any website in minutes — no coding required. Just describe what data you want or what action to perform, and Skill Forge handles the rest.
💖 Support the Project
BrowserAct Skills is free and open source. If it saves you time, please give us a ⭐ Star — it keeps the project alive and helps us ship more skills.
🎁 Bonus: Once you star the repository, you can join our Discord and post in the #claim-500-credits channel to receive 500 free credits!
🤝 Community & Support
Built with ❤️ by the BrowserAct Team
相似文章
@Jason23818126: 如果你经常用 AI 找信息,这个网站一定要收藏 Hermes 现在已经可以接入 http://Browse.sh 这是一个收录了数百个浏览器 Skill 的开源目录,一条 CLI 命令就能让 Agent 获得新的互联网能力 我翻了一圈,覆…
Browse.sh 是一个数百个浏览器 Skill 的开源目录,通过一条 CLI 命令即可让 AI Agent 获得新的互联网能力,覆盖找房、航班、电影、工作等场景。
@quant_sheep: 我让 Agent 通过 Chrome 帮我在 Airbnb 上找住处并完成预约了 它甚至还主动帮我咨询房东:有没有厨房 如果你需要让自己的 Agent 像人一样操作浏览器,不管是测试网页,还是自动预订 Airbnb 住宿,涉及网页的操作都…
展示了一个名为 open-browser-use 的开源工具,能让 AI Agent 像人一样操作 Chrome 浏览器,完成在 Airbnb 上查找住处并预约的全流程,包括主动咨询房东。
@QingQ77: 通过浏览器管理 AI 编码任务,用多 Agent 协作完成从规划、编码到 QA 审查的全流程。 https://github.com/dataseeek/MagesticAI… MagesticAI 一个基于浏览器的 AI 任务管理和 A…
MagesticAI 是一个基于浏览器的 AI 任务管理和多 Agent 编排平台,支持从规划、编码到 QA 的全流程协作。
@QingQ77: 一句话告诉 Agent 要操控哪个网站,它自动帮你生成 CLI 工具,直接走你已登录的 Chrome 浏览器,不用调 API 或搞 token。 https://github.com/better-world-ai/x-cli… x-cl…
x-cli 是一个开源项目,通过 AI agent 自动生成操控网页的 CLI 工具,利用 Chrome 已登录会话,无需 API 或 token。支持安装 Kimi WebBridge 插件后一句话生成 CLI,已内置百度搜索、Google 搜索等示例工具。
Open Browser Use
Open Browser Use 是一个面向本地AI代理的开源浏览器自动化工具。