@manateelazycat: Did a big shot come from Baidu's AI Whampoa Military Academy? The open-source Unlimited OCR, based on DeepSeek OCR, immediately drops a killer move. According to its published data, it scored 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR and...
Summary
The open-source OCR model Unlimited OCR, based on DeepSeek OCR, achieves 93.23 on OmniDocBench v1.5 with only 3B parameters, outperforming DeepSeek OCR, Gemini 2.5, and others.
View Cached Full Text
Cached at: 06/22/26, 09:52 PM
Baidu, the AI Huangpu Military Academy, welcomes a top talent?
Based on DeepSeek OCR, the open-sourced Unlimited OCR is a game-changer from the start.
In its own published data, it scored 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR and Gemini 2.5 among other competitors.
And it’s only 3B parameters 🤩 https://t.co/P85kk7Cr4E
Similar Articles
@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…
Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).
@berryxia: Wow, this move directly poached DeepSeek's talent! Last night I saw this interesting OCR open-source model on HuggingFace and the fascinating story behind it. This OCR model is completely different from traditional ones! Its speed and accuracy are absolutely unbeatable~~ Let me start with some background, for those who are familiar…
Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.
@berryxia: https://x.com/berryxia/status/2067078380017828205
The author tested the three tiers of PP-OCRv6 models and provided open-source tools for local deployment. They demonstrated performance comparisons of each model on OmniDocBench and real-world scenarios, emphasizing the advantages of lightweight specialized models for OCR tasks.
@rionaifantasy: Unbelievable! How Can a 34.5M Parameter OCR Beat a 235B Large Model? Let me tell you something ridiculous: I used to believe the future of OCR would inevitably be devoured by ever-larger multimodal large models. But after seeing PP-OCRv6 released by Baidu Wenxin, I've changed my mind. Because it doesn't follow the path of "continuing to pile on parameters..."
Baidu Wenxin releases PP-OCRv6, offering three model tiers: Tiny, Small, and Medium, supporting over 50 languages. The Tiny version is only 1.5MB and can run locally in a browser, with the fastest single-image inference at 97ms, proving that small specialized models can outperform large models on OCR tasks.
baidu/Unlimited-OCR
Baidu releases Unlimited-OCR, a new model for one-shot long-horizon document parsing, building on Deepseek-OCR. It supports single image and multi-page/PDF parsing via Hugging Face Transformers and SGLang.