@manateelazycat: Did a big shot come from Baidu's AI Whampoa Military Academy? The open-source Unlimited OCR, based on DeepSeek OCR, immediately drops a killer move. According to its published data, it scored 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR and...

X AI KOLs Timeline Models

Summary

The open-source OCR model Unlimited OCR, based on DeepSeek OCR, achieves 93.23 on OmniDocBench v1.5 with only 3B parameters, outperforming DeepSeek OCR, Gemini 2.5, and others.

Did a big shot come from Baidu's AI Whampoa Military Academy? The open-source Unlimited OCR, based on DeepSeek OCR, immediately drops a killer move According to its published data, it scored 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR, Gemini 2.5, and a host of other competitors And it's only 3B in size 🤩 https://t.co/P85kk7Cr4E
Original Article
View Cached Full Text

Cached at: 06/22/26, 09:52 PM

Baidu, the AI Huangpu Military Academy, welcomes a top talent?

Based on DeepSeek OCR, the open-sourced Unlimited OCR is a game-changer from the start.

In its own published data, it scored 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR and Gemini 2.5 among other competitors.

And it’s only 3B parameters 🤩 https://t.co/P85kk7Cr4E

Similar Articles

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

X AI KOLs Timeline

Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).

@berryxia: Wow, this move directly poached DeepSeek's talent! Last night I saw this interesting OCR open-source model on HuggingFace and the fascinating story behind it. This OCR model is completely different from traditional ones! Its speed and accuracy are absolutely unbeatable~~ Let me start with some background, for those who are familiar…

X AI KOLs Timeline

Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.

@berryxia: https://x.com/berryxia/status/2067078380017828205

X AI KOLs Timeline

The author tested the three tiers of PP-OCRv6 models and provided open-source tools for local deployment. They demonstrated performance comparisons of each model on OmniDocBench and real-world scenarios, emphasizing the advantages of lightweight specialized models for OCR tasks.

@rionaifantasy: Unbelievable! How Can a 34.5M Parameter OCR Beat a 235B Large Model? Let me tell you something ridiculous: I used to believe the future of OCR would inevitably be devoured by ever-larger multimodal large models. But after seeing PP-OCRv6 released by Baidu Wenxin, I've changed my mind. Because it doesn't follow the path of "continuing to pile on parameters..."

X AI KOLs Timeline

Baidu Wenxin releases PP-OCRv6, offering three model tiers: Tiny, Small, and Medium, supporting over 50 languages. The Tiny version is only 1.5MB and can run locally in a browser, with the fastest single-image inference at 97ms, proving that small specialized models can outperform large models on OCR tasks.

baidu/Unlimited-OCR

Hugging Face Models Trending

Baidu releases Unlimited-OCR, a new model for one-shot long-horizon document parsing, building on Deepseek-OCR. It supports single image and multi-page/PDF parsing via Hugging Face Transformers and SGLang.