@CoinSh0t: A Chinese GitHub account just dropped the 10 biggest free open-source projects of 2026. If you’re from Europe or the US…

X AI KOLs Timeline News

Summary

A Chinese GitHub account curated the 10 biggest free open-source AI projects of 2026, including tools for browser automation (Midscene.js), voice cloning (GPT-SoVITS), document-based AI agents (MaxKB), database queries (DB-GPT), knowledge assistants (FastGPT), PDF parsing (MinerU), multi-step agents (OpenManus), video generation (Wan 2.2), document grounding (RAGFlow), and AI app platforms (Dify).

A Chinese GitHub account just dropped the 10 biggest free open-source projects of 2026. If you’re from Europe or the US, there’s a 99% chance you’ve never seen any of them. I translated all 10 into English so you don’t have to: MidScene.js MidScene.js lets you automate a browser or a phone by describing what to do in plain English, with no CSS selectors and no XPath. It is built by the ByteDance team, and because it reads the screen like a human, your scripts survive any UI redesign. → http://github.com/web-infra-dev/midscene… GPT-SoVITS GPT-SoVITS clones a voice from a five-second sample and then speaks in English, Chinese, Japanese, Korean and Cantonese. It is one of the most-used open voice cloning tools on the planet, and it runs free on your own machine. → http://github.com/RVC-Boss/GPT-SoVITS… MaxKB MaxKB turns your documents into an AI agent you can drop onto any website with a single line of script. It deploys in one click with a local model built in, so your data never leaves your server. → http://github.com/1Panel-dev/MaxKB… DB-GPT DB-GPT lets anyone in your company ask the database a question in plain English and get back the SQL plus a finished chart. It runs fully private and connects to MySQL, Postgres, ClickHouse and more. → http://github.com/eosphoros-ai/DB-GPT… FastGPT FastGPT takes a pile of your documents and turns them into a working AI knowledge assistant with no code. It runs on a 2GB server and already has hundreds of thousands of users. → http://github.com/labring/FastGPT MinerU MinerU rips messy PDFs, scans and Office files into clean LLM-ready text without destroying the tables. It is the document layer serious teams feed into their AI, built by a top Chinese research lab. → http://github.com/opendatalab/MinerU… OpenManus OpenManus is a free open clone of the Manus agent that the MetaGPT team built in about three hours. You point it at your own API keys and it plans and runs multi-step tasks on its own, no invite code required. → http://github.com/FoundationAgents/OpenManus… Wan 2.2 Wan 2.2 is Alibaba's open video model that generates text-to-video and image-to-video on a single consumer GPU. Commercial video APIs charge up to a dollar per second, and this does the same job for the cost of electricity. → http://github.com/Wan-Video/Wan2.2… RAGFlow RAGFlow is the engine people switch to when their AI keeps hallucinating on real documents. Its parsing handles tables, scans and twenty-plus formats, which is why it sits past 75,000 stars. → http://github.com/infiniflow/ragflow… Dify Dify is the full platform for building AI apps, with a visual workflow canvas, RAG and agents in one place. It has quietly crossed 130,000 stars and runs in production at companies you have heard of. → http://github.com/langgenius/dify
Original Article
View Cached Full Text

Cached at: 06/18/26, 04:07 AM

A Chinese GitHub account just dropped the 10 biggest free open-source projects of 2026.

If you’re from Europe or the US, there’s a 99% chance you’ve never seen any of them.

I translated all 10 into English so you don’t have to:

MidScene.js

MidScene.js lets you automate a browser or a phone by describing what to do in plain English, with no CSS selectors and no XPath.

It is built by the ByteDance team, and because it reads the screen like a human, your scripts survive any UI redesign.

→ http://github.com/web-infra-dev/midscene…

GPT-SoVITS

GPT-SoVITS clones a voice from a five-second sample and then speaks in English, Chinese, Japanese, Korean and Cantonese.

It is one of the most-used open voice cloning tools on the planet, and it runs free on your own machine.

→ http://github.com/RVC-Boss/GPT-SoVITS…

MaxKB

MaxKB turns your documents into an AI agent you can drop onto any website with a single line of script.

It deploys in one click with a local model built in, so your data never leaves your server.

→ http://github.com/1Panel-dev/MaxKB…

DB-GPT

DB-GPT lets anyone in your company ask the database a question in plain English and get back the SQL plus a finished chart.

It runs fully private and connects to MySQL, Postgres, ClickHouse and more.

→ http://github.com/eosphoros-ai/DB-GPT…

FastGPT

FastGPT takes a pile of your documents and turns them into a working AI knowledge assistant with no code.

It runs on a 2GB server and already has hundreds of thousands of users.

→ http://github.com/labring/FastGPT

MinerU

MinerU rips messy PDFs, scans and Office files into clean LLM-ready text without destroying the tables.

It is the document layer serious teams feed into their AI, built by a top Chinese research lab.

→ http://github.com/opendatalab/MinerU…

OpenManus

OpenManus is a free open clone of the Manus agent that the MetaGPT team built in about three hours.

You point it at your own API keys and it plans and runs multi-step tasks on its own, no invite code required.

→ http://github.com/FoundationAgents/OpenManus…

Wan 2.2

Wan 2.2 is Alibaba’s open video model that generates text-to-video and image-to-video on a single consumer GPU.

Commercial video APIs charge up to a dollar per second, and this does the same job for the cost of electricity.

→ http://github.com/Wan-Video/Wan2.2…

RAGFlow

RAGFlow is the engine people switch to when their AI keeps hallucinating on real documents.

Its parsing handles tables, scans and twenty-plus formats, which is why it sits past 75,000 stars.

→ http://github.com/infiniflow/ragflow…

Dify

Dify is the full platform for building AI apps, with a visual workflow canvas, RAG and agents in one place.

It has quietly crossed 130,000 stars and runs in production at companies you have heard of.

→ http://github.com/langgenius/dify


web-infra-dev/midscene

Source: https://github.com/web-infra-dev/midscene

Midscene.js

Midscene.js

English | 简体中文

Official Website: https://midscenejs.com/

web-infra-dev%2Fmidscene | Trendshift

Open-source, vision-driven UI testing — write tests in natural language, automate any platform.

npm version hugging face model downloads License discord twitter Ask DeepWiki.com

📣 Midscene Skills is here!

Use Midscene Skills to control any platform with OpenClaw

Showcases

💡 Why Midscene

Most UI automation — including AI tools that read the DOM or the accessibility tree — depends on page structure. That structure is fragile and incomplete: selectors break on every refactor, elements without semantic markup (icon-only buttons, custom controls, <canvas>) are invisible to it, native apps and cross-origin iframes are out of reach, and it cannot tell whether something actually looks right. Midscene works from the screenshot alone, and you describe each step in natural language:

  • Less maintenance — no selectors to chase when the UI changes.
  • Reach every element and surface — if a human can see it, Midscene can target it, even with no semantic annotations, on <canvas>, native apps, and cross-origin iframes.
  • Assert what users actually see — verify colors, highlights, layout, and rendered state, not just whether a DOM node exists.
  • Two ways to test — add Midscene to your Playwright / Vitest suite, or let an AI agent test autonomously via Skills and MCP.

Midscene is built for UI testing first, but the same vision-driven engine handles any UI automation task.

💡 What you can automate

Midscene works anywhere you can take a screenshot — web browsers, Android, iOS, HarmonyOS, desktop apps, and any custom interface — all through one API. Write automation with the JavaScript SDK or in YAML, hand it to AI agents via Skills and MCP, and look up every method (aiAct, aiQuery, aiAssert, and more) in the API reference.

🚀 Get started

✨ Driven by Multimodal Models

Midscene is all-in on pure vision for UI actions: element localization is based on screenshots only. It runs on multimodal models with strong UI localization, such as Qwen3.x, Doubao-Seed-2.0, GLM-4.6V, gemini-3.5-flash, and UI-TARS, including open-source options you can self-host. For data extraction and page understanding, you can still opt in to include DOM when needed.

Read more about Model Strategy.

📄 Resources

🤝 Community

🌟 Awesome Midscene

Community projects that extend Midscene.js capabilities:

📝 Credits

We would like to thank the following projects:

  • Rsbuild and Rslib for the build tool.
  • UI-TARS for the open-source agent model UI-TARS.
  • Qwen-VL for the open-source multimodal model Qwen-VL.
  • scrcpy and yume-chan allow us to control Android devices with browser.
  • appium-adb for the javascript bridge of adb.
  • appium-webdriveragent for the javascript operate XCTest。
  • YADB for the yadb tool which improves the performance of text input.
  • libnut-core for the cross-platform native keyboard and mouse control.
  • Puppeteer for browser automation and control.
  • Playwright for browser automation and control and testing.

📖 Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

✨ Star History

Star History Chart

📝 License

Midscene.js is MIT licensed.


If this project helps you or inspires you, please give us a star

GitHubDaily (@GitHub_Daily): 每天都诞生各种各样 AI 智能体,想要快速找到适合特定业务场景的工具和框架,确实有些费时费力。

GitHub 上的 awesome-ai-agents-2026 开源项目,正好整理了一份详细的当下主流 AI 智能体工具。

收录了 340 多个项目,涵盖代码编写、语音交互、创意生成、工作流自动化等 20 多个细分领域。

Similar Articles

Top 10 Fastest Growing AI repos this week

Reddit r/LocalLLaMA

A curated list of the top 10 fastest-growing AI repositories on GitHub this week, featuring AI coding agents, personal AI, memory systems, browser automation, and local-first developer tools.