@Fenng: HuggingFace and GitHub charts hit top four, stars surpass 10k in just 5 days — Baidu Unlimited OCR becomes one of the fastest growing open source projects. I've seen many people mentioning Baidu's Unlimited-OCR in my timeline lately. Actually, OCR has always been a traditional strength of Baidu…
Summary
Baidu's open source project Unlimited-OCR tops four charts on HuggingFace and GitHub, with stars exceeding 10k in five days. The model uses a MoE architecture (3B total parameters, 570M activated parameters) and excels at continuous recognition of long documents. Inspired by how humans copy books, it also offers new ideas for long-term memory management in large models.
View Cached Full Text
Cached at: 06/29/26, 06:30 AM
HuggingFace, GitHub Four Charts Topping, 5 Days to Break 10K Stars, Baidu Unlimited OCR Ranks Among Fastest-Growing Open Source Projects.
A couple of days ago, I noticed many people in my timeline talking about Baidu’s Unlimited-OCR release.
Actually, OCR is a traditional strength for Baidu, with accumulated technical expertise, and PaddleOCR has always had a great reputation.
This Unlimited-OCR model isn’t huge in scale: total 3B parameters, 570M activated parameters MoE, but it’s particularly strong in continuous recognition for dozens of pages of documents… reportedly inspired by the way humans copy books by hand. Not only does this improve OCR usability in long document scenarios, but it also provides new technical ideas for long-term memory management in large models. This is great news for many teams with specific technical needs.
Based on DeepSeek-OCR’s DeepEncoder, it pushes forward the engineering bottleneck of long document parsing in the DeepSeek-OCR pipeline. The “YY” in the author list has also sparked some speculation, with some suspecting it’s Wei Haoran, the core author of DeepSeek-OCR, but this remains unconfirmed.
Similar Articles
@GoSailGlobal: Current OCR processes multi-page documents page by page. Every time you turn a page, memory is reset. Today, Baidu quietly open-sourced a model on GitHub and HuggingFace called Unlimited OCR, inspired by how humans copy books: - When copying a book, you don't reread hundreds of pages every time you write a word...
Baidu has open-sourced the Unlimited OCR model, which uses a Reference Sliding Window Attention (R-SWA) mechanism to parse documents up to 32K context in a single pass, eliminating the need for page-by-page inference.
@berryxia: Wow, this move directly poached DeepSeek's talent! Last night I saw this interesting OCR open-source model on HuggingFace and the fascinating story behind it. This OCR model is completely different from traditional ones! Its speed and accuracy are absolutely unbeatable~~ Let me start with some background, for those who are familiar…
Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.
@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…
Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).
@rionaifantasy: Unbelievable! How Can a 34.5M Parameter OCR Beat a 235B Large Model? Let me tell you something ridiculous: I used to believe the future of OCR would inevitably be devoured by ever-larger multimodal large models. But after seeing PP-OCRv6 released by Baidu Wenxin, I've changed my mind. Because it doesn't follow the path of "continuing to pile on parameters..."
Baidu Wenxin releases PP-OCRv6, offering three model tiers: Tiny, Small, and Medium, supporting over 50 languages. The Tiny version is only 1.5MB and can run locally in a browser, with the fastest single-image inference at 97ms, proving that small specialized models can outperform large models on OCR tasks.
@manateelazycat: Did a big shot come from Baidu's AI Whampoa Military Academy? The open-source Unlimited OCR, based on DeepSeek OCR, immediately drops a killer move. According to its published data, it scored 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR and...
The open-source OCR model Unlimited OCR, based on DeepSeek OCR, achieves 93.23 on OmniDocBench v1.5 with only 3B parameters, outperforming DeepSeek OCR, Gemini 2.5, and others.