@WY_mask: MediaCrawler: Open-source web scraping tool for Xiaohongshu, Douyin, Weibo, Bilibili, Kuaishou. Supports scraping videos, images, comments, likes, reposts, etc. https://github.com/NanmiCoder/MediaCrawler…

X AI KOLs Timeline 06/21/26, 08:45 PM Tools

Summary

MediaCrawler is an open-source multi-platform self-media data collection tool that supports scraping public information from Xiaohongshu, Douyin, Weibo, Bilibili, Kuaishou and other platforms. No JS reverse engineering required, based on Playwright browser automation.

MediaCrawler: Open-source web scraping tool for Xiaohongshu, Douyin, Weibo, Bilibili, Kuaishou Supports scraping videos, images, comments, likes, reposts, etc. https://t.co/dmWKHK3BAf https://t.co/HH39FWXCpg

Original Article

View Cached Full Text

Cached at: 06/22/26, 03:48 PM

MediaCrawler: Open-source scraping tool for Xiaohongshu/Douyin/Weibo/Bilibili/Kuaishou – supports crawling videos, images, comments, likes, reposts, etc. https://t.co/dmWKHK3BAf https://t.co/HH39FWXCpg — # NanmiCoder/MediaCrawler Source: https://github.com/NanmiCoder/MediaCrawler # 🔥 MediaCrawler - Social Media Scraper 🕷️ GitHub Stars (https://github.com/NanmiCoder/MediaCrawler/stargazers) GitHub Forks (https://github.com/NanmiCoder/MediaCrawler/network/members) GitHub Issues (https://github.com/NanmiCoder/MediaCrawler/issues) GitHub Pull Requests (https://github.com/NanmiCoder/MediaCrawler/pulls) License (https://github.com/NanmiCoder/MediaCrawler/blob/main/LICENSE) 中文 English Español > Disclaimer: > > This repository is intended for learning purposes only ⚠️⚠️⚠️⚠️. For cases of illegal scraping activities, please refer to Crawler Illegal Cases In China. > > All content in this repository is provided for learning and reference only. Commercial use is prohibited. No individual or organization shall use the content for illegal purposes or infringe upon others’ legitimate rights. The scraping techniques covered are solely for study and research, and must not be used for large-scale scraping or other illegal activities on any platform. The repository bears no responsibility for any legal liability arising from its use. By using this repository, you agree to all terms and conditions of this disclaimer. > > Click for a more detailed disclaimer. Jump to disclaimer ## 📖 Project Introduction A powerful multi-platform social media data collection tool supporting public data scraping from major platforms such as Xiaohongshu (RED), Douyin (TikTok China), Kuaishou, Bilibili, Weibo, Tieba, and Zhihu. ### 🔧 Technical Principles - Core Technology: Based on Playwright browser automation framework to log in and persist login state. - No JS Reverse Engineering: Uses browser context with preserved login state to obtain signature parameters via JS expressions. - Advantages: Eliminates the need to reverse complex encryption algorithms, significantly lowering the technical barrier. ## ✨ Features | Platform | Keyword Search | Scrape by Post ID | Nested Comments | Creator Profile | Login Persistence | IP Proxy Pool | Comment Word Cloud | | —–– | ––––––– | —————– | ————— | ————— | –––––––– | ———–– | —————— | | Xiaohongshu | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Douyin | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Kuaishou | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Bilibili | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Weibo | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Tieba | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Zhihu | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | MediaCrawlerPro is here! Open source is not easy, welcome to subscribe and support. > Focus on learning the architecture design of mature projects. It’s not just about scraping; the Pro version’s code design is equally worth deep study! MediaCrawlerPro Core advantages over the open-source version: #### 🎯 Core Feature Upgrades - ✅ Social Media Content Decomposition Agent (New) - ✅ Resume Interrupted Scraping (Key Feature) - ✅ Multi-Account + IP Proxy Pool Support (Key Feature) - ✅ Removed Playwright Dependency – simpler to use - ✅ Full Linux Environment Support #### 🏗️ Architecture Design Improvements - ✅ Code Refactoring – more readable and maintainable (decoupled JS signing logic) - ✅ Enterprise-Level Code Quality – suitable for building large scraping projects - ✅ Excellent Architecture Design – high scalability, greater learning value from source code #### 🎁 Extra Features - ✅ Desktop Social Media Video Downloader (great for full-stack learning) - ✅ Multi-Platform Home Feed Recommendations (HomeFeed) - ✅ AI Agent Skill Support – one-click install for OpenClaw 🦞 / Claude Code / Cursor, letting agents scrape data automatically - [ ] Comment Analysis AI Agent – in development 🚀🚀 Check out the MediaCrawlerPro project page for more details. ## 🚀 Quick Start > 💡 If this project helps you, please give it a ⭐ Star! ## 📋 Prerequisites ### 🚀 Install uv (Recommended) Before proceeding, make sure uv is installed on your machine: - Installation: uv official installation guide - Verify: Run uv --version in a terminal – if the version is displayed, installation succeeded. - Why uv? It’s the fastest Python package manager, with swift dependency resolution. ### 🟢 Install Node.js This project depends on Node.js. Download and install from the official website: - Download: https://nodejs.org/en/download/ - Version Requirement: >= 16.0.0 ### 📦 Install Python Packages shell # Navigate to the project directory cd MediaCrawler # Use uv sync to ensure consistent python version and dependencies uv sync ### 🌐 Install Browser Driver (Optional) > If using the default CDP mode (connecting to an existing Chrome browser), no browser driver installation is needed. This step is only required for standard Playwright mode. shell # Only needed for standard Playwright mode uv run playwright install ### 🌍 Configure Chrome Browser (Recommended) By default, the project uses CDP mode to connect to your existing Chrome browser, reusing its login state, cookies, extensions, etc. This significantly reduces the risk of platform anti-bot detection. Before using: 1. Install the latest Chrome browser (version >= 144), download link. 2. Enable remote debugging: In Chrome’s address bar, go to chrome://inspect/#remote-debugging and check “Allow remote debugging for this browser instance”. 3. When the page shows Server running at: 127.0.0.1:9222, it’s ready. > 💡 Tip: After starting the scraper, Chrome will show a confirmation dialog – click “Accept”. The program will wait for 60 seconds for you to confirm. > > If you prefer not to use CDP mode, set ENABLE_CDP_MODE = False in config/base_config.py to switch to standard Playwright mode. ## 🚀 Run the Scraper shell # Check config/base_config.py for configuration details (Chinese comments included) # Scrape posts based on keyword search from config file uv run main.py --platform xhs --lt qrcode --type search # Scrape specific posts by post ID list from config file uv run main.py --platform xhs --lt qrcode --type detail # Open the corresponding app to scan QR code for login # For other platform scraping examples, run: uv run main.py --help 🖥️ WebUI Interface MediaCrawler provides a web-based visual interface for easy use without the command line. #### Start WebUI Service shell # Start the API server (default port 8080) uv run uvicorn api.main:app --port 8080 --reload # Or use module launch uv run python -m api.main After starting, visit http://localhost:8080 to open the WebUI. #### WebUI Features - Visual configuration of scraper parameters (platform, login method, scrape type, etc.) - Real-time view of scraper status and logs - Data preview and export #### Preview Using Python native venv (not recommended) #### Create and Activate a Python Virtual Environment > For scraping Douyin and Zhihu, Node.js (>= 16) must be installed beforehand. shell # Go to project root cd MediaCrawler # Create virtual environment # My Python version is 3.11 – requirements.txt is based on that. # Other Python versions may cause incompatibilities – resolve manually if needed. python -m venv venv # macOS & Linux source venv/bin/activate # Windows venv\Scripts\activate #### Install Required Libraries shell pip install -r requirements.txt #### Install Playwright Browser Driver shell playwright install #### Run the Scraper (Native Environment) shell # By default, comment scraping is disabled. To enable, modify ENABLE_GET_COMMENTS in config/base_config.py. # Other options can also be configured in config/base_config.py (with Chinese comments). # Scrape posts based on keyword search from config file python main.py --platform xhs --lt qrcode --type search # Scrape specific posts by post ID list from config file python main.py --platform xhs --lt qrcode --type detail # Open the corresponding app to scan QR code for login # For other platform scraping examples, run: python main.py --help ## 💾 Data Storage MediaCrawler supports multiple storage formats: CSV, JSON, JSONL, Excel, SQLite, and MySQL. 📖 Detailed guide: Data Storage Guide 🚀 MediaCrawlerPro is here 🚀! More features, better architecture! Open source is not easy, welcome to subscribe and support! (https://github.com/MediaCrawlerPro) ## 💬 Community Groups - WeChat Group: Join - Bilibili: Follow me for AI and scraping tech. ## 💰 Sponsors TikHub.io provides 900+ highly stable data APIs covering 14+ major platforms (TK, DY, XHS, Y2B, Ins, X, etc.), including user, content, product, and comment public data. Also offers 40M+ cleaned structured datasets. Use invite code cfzyejV9 when registering and topping up to get an extra $2 credit. Atlas Cloud is a full-modality AI reasoning platform that gives developers access to video generation, image generation, and LLM APIs through a unified AI API, calling 300+ curated models without managing multiple vendor integrations. Atlas Cloud’s new coding plan offers developers cost-effective API access budgets. — ## 🤝 Become a Sponsor Become a sponsor and showcase your product here, gaining daily exposure! Contact: - WeChat: relakkes - Email: [email protected] — ## ☕ Buy Me a Coffee If this project helps you, feel free to donate – every bit of support keeps me going ❤️ WeChat Pay Alipay Buy Me a Coffee — ## 📚 Other - FAQ: MediaCrawler Full Documentation - Scraping Tutorial: CrawlerTutorial Free Tutorial - News Scraper Project: NewsCrawlerCollection ## ⭐ Star History If this project helps you, please give it a ⭐ Star so more people can discover MediaCrawler! Star History Chart ## 📚 References - Xiaohongshu Signature Repo: Cloxl’s xhs signature repo - Xiaohongshu Client: ReaJason’s xhs repo - SMS Forwarding: SmsForwarder reference repo - Intranet Penetration Tool: ngrok official docs # Disclaimer ## 1. Purpose and Nature This project (hereinafter “the Project”) is created as a technical research and learning tool, aiming to explore and study web data collection techniques. The Project focuses on research into social media data scraping, intended for technical exchange among learners and researchers. ## 2. Legal Compliance Statement The Project developer (hereinafter “the Developer”) reminds users to strictly comply with all applicable Chinese laws and regulations when downloading, installing, and using the Project, including but not limited to the Cybersecurity Law, Anti-Espionage Law, and all other relevant national laws and policies. Users bear all legal liability arising from the use of the Project. ## 3. Usage Restriction The Project is strictly prohibited from being used for any illegal purpose or non-learning/non-research commercial activities. It must not be used for any form of illegal intrusion into computer systems, or infringement of intellectual property rights or other legitimate rights of others. Users must ensure their use of the Project is purely for personal learning and technical research, and not for any illegal activities. ## 4. Disclaimer The Developer has made every effort to ensure the legitimacy and safety of the Project, but assumes no liability for any direct or indirect losses arising from users’ use of the Project, including but not limited to data loss, device damage, legal proceedings, etc. ## 5. Intellectual Property The intellectual property rights of the Project belong to the Developer. The Project is protected by copyright law, international copyright treaties, and other intellectual property laws and treaties. Users may download and use the Project in compliance with this disclaimer and relevant laws. ## 6. Final Interpretation The Developer reserves the right of final interpretation of this Project. The Developer reserves the right to change or update this disclaimer at any time without prior notice. > WY_mask (@WY_mask): > This search skill is a must-have for Agents – fully open-source and free. > > Agents often face issues like missing subtitles on YouTube, inability to access Twitter, Xiaohongshu blocked, paid APIs, login-required accounts, etc. > > Every platform has its own barriers. Installing this can transform your Agent’s search capabilities – all tools open-source, all APIs free.

@WY_mask: MediaCrawler: Open-source web scraping tool for Xiaohongshu, Douyin, Weibo, Bilibili, Kuaishou. Supports scraping videos, images, comments, likes, reposts, etc. https://github.com/NanmiCoder/MediaCrawler…

Similar Articles

Submit Feedback

Similar Articles

@NFTCPS: Finally found out where those repost accounts on X get their content! It's this tool MediaCrawler, a single tool that covers Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, and Zhihu. It can scrape public content, comments, likes, and reposts. The best part is it doesn't need JS reverse engineering—it uses browser login state to get signatures directly, …

@GYLQ520: Hey self-media friends, pay attention! If you miss this tool, you'll regret it big time. MultiPost, a free open-source browser plugin on GitHub, lets you push content to over a dozen platforms like Weibo, Zhihu, Xiaohongshu with one click, no more copy-pasting one by one. Supports text, images, and videos, plus scheduled posting, auto web scraping, and more...

@xiaoerzhan: The 'Xiaoer Grab Video' browser extension I use daily is now open-source. One click and the current page's video is downloaded locally. Supports 1800+ sites like YouTube, Bilibili, X, TikTok, etc. This tool was created because: I believe video tutorials are the most effective – you follow along step by step, the visuals are clear. But then the things I want to learn keep piling up…

@AmberTreelet: Tiance Ge shared yt-dlp for scraping Douyin, YouTube, Bilibili, Twitter. I'll add some universal scraping tools. FxTwitter: Recommended by @0xCheshire for scraping X. get笔记 (Dedao Brain): WeChat Official Accounts, Xiaohongshu, Douyin, Bilibili, X, Podcasts. Google Chrome extension obsidian web clipp…

@axichuhai: Folks, this open-source project is like having a god's-eye view, boosting web scraping efficiency tens of times over. It has topped GitHub trending with 50k+ stars. No more writing code, maintaining selectors, or dealing with anti-scraping measures. Just drop in a URL, zero-code, naturally bypass blocks, no need to maintain selectors...