Tag
This article explains how AI agents in 2026 collect data from websites and APIs, and discusses key challenges like rate limits, CAPTCHAs, and IP blocking.
A developer upgrades his AI dating assistant to Fable, detailing a complex architecture of agentic AI agents that scrape social media profiles, perform OSINT enrichment, score matches, and use genetic algorithms for optimization.
This article introduces 5 open-source tools (Agent-reach, Scrapling, Browser-use, Claude in Chrome, Web-access) that enable AI agents like Claude Code to perform web scraping, browser operations, etc., covering scenarios from lightweight to heavy-duty, along with configuration tips.
Firecrawl launches agent signups, enabling AI agents to instantly claim API keys and pull web data, with integration for Codex, Claude Code, and Grok Build, powered by WorkOS.
A 16-year-old developer open-sourced the Rust-based headless browser engine Obscura, designed for crawlers and AI Agent automation, with memory usage of only 30MB, and has already garnered over 14,600 GitHub stars.
Agent Reach is an open-source command-line tool that provides a unified free interface for AI Agents, covering deep search capabilities across 15+ platforms including Twitter, Reddit, and YouTube, with no API fees required. It has already gained 21.7k+ stars.
This research reveals how Bright Data's SDK turns smart TVs and phones into residential proxy nodes for AI web scraping, highlighting privacy risks and the legal supply side of residential proxy networks.
Agent Reach is an open-source Python scaffold that allows AI agents to read multiple platforms such as Twitter, Reddit, YouTube, Bilibili, and Xiaohongshu with zero API fees, solving the problem of agents being unable to access the internet.
This article explains why proxies are essential for AI agents to avoid rate limits, CAPTCHAs, and geo-restrictions when collecting data at scale, and covers common use cases and types of proxies.
A list of six powerful but lesser-known AI developer tools: Instructor for structured JSON output, Octopoda for agent memory, E2B for secure sandboxes, Firecrawl for website-to-markdown, Composio for app integrations, and LiteLLM for multi-model API.
The BrowserAct team open-sourced a browser automation command-line tool designed specifically for AI agents, providing three layers of anti-blocking mechanisms (fingerprint spoofing, captcha cracking, human takeover), supports multi-browser parallelism and account isolation, and optimizes output format to save tokens.
TinyFish Bigset is an open-source multi-agent system that turns natural language prompts into structured datasets from the live web, with schema inference, autonomous research agents, and scheduled refresh. It runs self-hosted via Docker and is built on TinyFish's search infrastructure.
The article discusses how AI coding assistants make large-scale web scraping accessible to ordinary people, raising ethical concerns about ignoring robots.txt and rate limits, and questions the responsibility of AI providers.
A commentary on the ethical challenges of AI agents ignoring website rules like robots.txt when generating scrapers, and the responsibility of AI providers to implement guardrails without hindering product usability.
A comparison of web search APIs and tools that provide clean Markdown output for grounding local RAG pipelines, evaluating Brave Search, Parallel AI, You.com, Exa, Tavily, Firecrawl, Jina Reader, and SearXNG on signal-to-noise ratio and developer overhead.
This open-source project can scrape web data with zero code, bypass anti-scraping mechanisms, boost efficiency tens of times, and has earned 50k+ stars.
An experiment comparing AI agent accuracy and token cost when reading raw HTML vs structured formats; raw HTML costs double the tokens with lower accuracy.
Amnesty International's briefing argues that generative AI systems built on unlawful web scraping violate international human rights law, and calls for their prohibition.
24OpenClaw (Scrapling) is an open-source web scraping tool that claims zero anti-scraping detection, native Cloudflare bypass, and is 774x faster than BeautifulSoup, with no need to maintain selectors.
A tool that enables AI agents to automatically find websites and contact information for any company, with no signup required.