@Fenng: HuggingFace and GitHub charts hit top four, stars surpass 10k in just 5 days — Baidu Unlimited OCR becomes one of the fastest growing open source projects. I've seen many people mentioning Baidu's Unlimited-OCR in my timeline lately. Actually, OCR has always been a traditional strength of Baidu…

X AI KOLs Following Models

Summary

Baidu's open source project Unlimited-OCR tops four charts on HuggingFace and GitHub, with stars exceeding 10k in five days. The model uses a MoE architecture (3B total parameters, 570M activated parameters) and excels at continuous recognition of long documents. Inspired by how humans copy books, it also offers new ideas for long-term memory management in large models.

HuggingFace and GitHub charts hit top four, stars surpass 10k in just 5 days — Baidu Unlimited OCR becomes one of the fastest growing open source projects. I've seen many people mentioning Baidu's Unlimited-OCR in my timeline lately. Actually, OCR has always been a traditional strength of Baidu with solid technical accumulation; PaddleOCR has a very good reputation. This time, the Unlimited-OCR model is not large: 3B total parameters, 570M activated parameters in a MoE, but it is particularly strong at continuous recognition of dozens of pages of documents… Reportedly inspired by the way humans copy books, it not only improves OCR usability in long-document scenarios but also provides a new technical approach for long-term memory management in large models. This is great news for teams with technical scenario needs. Based on DeepSeek-OCR's DeepEncoder, it pushes forward the engineering bottleneck of long-document parsing in the DeepSeek-OCR pipeline. The "YY" in the author list has also sparked speculation, with some suspecting it is Wei Haoran, the core author of DeepSeek-OCR, though unconfirmed.
Original Article
View Cached Full Text

Cached at: 06/29/26, 06:30 AM

HuggingFace, GitHub Four Charts Topping, 5 Days to Break 10K Stars, Baidu Unlimited OCR Ranks Among Fastest-Growing Open Source Projects.

A couple of days ago, I noticed many people in my timeline talking about Baidu’s Unlimited-OCR release.

Actually, OCR is a traditional strength for Baidu, with accumulated technical expertise, and PaddleOCR has always had a great reputation.

This Unlimited-OCR model isn’t huge in scale: total 3B parameters, 570M activated parameters MoE, but it’s particularly strong in continuous recognition for dozens of pages of documents… reportedly inspired by the way humans copy books by hand. Not only does this improve OCR usability in long document scenarios, but it also provides new technical ideas for long-term memory management in large models. This is great news for many teams with specific technical needs.

Based on DeepSeek-OCR’s DeepEncoder, it pushes forward the engineering bottleneck of long document parsing in the DeepSeek-OCR pipeline. The “YY” in the author list has also sparked some speculation, with some suspecting it’s Wei Haoran, the core author of DeepSeek-OCR, but this remains unconfirmed.

Similar Articles

@GoSailGlobal: Current OCR processes multi-page documents page by page. Every time you turn a page, memory is reset. Today, Baidu quietly open-sourced a model on GitHub and HuggingFace called Unlimited OCR, inspired by how humans copy books: - When copying a book, you don't reread hundreds of pages every time you write a word...

X AI KOLs Timeline

Baidu has open-sourced the Unlimited OCR model, which uses a Reference Sliding Window Attention (R-SWA) mechanism to parse documents up to 32K context in a single pass, eliminating the need for page-by-page inference.

@berryxia: Wow, this move directly poached DeepSeek's talent! Last night I saw this interesting OCR open-source model on HuggingFace and the fascinating story behind it. This OCR model is completely different from traditional ones! Its speed and accuracy are absolutely unbeatable~~ Let me start with some background, for those who are familiar…

X AI KOLs Timeline

Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

X AI KOLs Timeline

Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).

@rionaifantasy: Unbelievable! How Can a 34.5M Parameter OCR Beat a 235B Large Model? Let me tell you something ridiculous: I used to believe the future of OCR would inevitably be devoured by ever-larger multimodal large models. But after seeing PP-OCRv6 released by Baidu Wenxin, I've changed my mind. Because it doesn't follow the path of "continuing to pile on parameters..."

X AI KOLs Timeline

Baidu Wenxin releases PP-OCRv6, offering three model tiers: Tiny, Small, and Medium, supporting over 50 languages. The Tiny version is only 1.5MB and can run locally in a browser, with the fastest single-image inference at 97ms, proving that small specialized models can outperform large models on OCR tasks.