@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…

X AI KOLs Timeline Models

Summary

dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.

A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual document parser that proves you don't need massive models for perfect document understanding. Current SOTA models are often massive (72B+) or require
Original Article

Similar Articles

dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

Papers with Code Trending

This paper presents dots.ocr, a unified Vision-Language Model that jointly learns layout detection, text recognition, and relational understanding for multilingual document layout parsing. It achieves state-of-the-art results on OmniDocBench and introduces the XDocParse benchmark spanning 126 languages.

@rionaifantasy: Unbelievable! How Can a 34.5M Parameter OCR Beat a 235B Large Model? Let me tell you something ridiculous: I used to believe the future of OCR would inevitably be devoured by ever-larger multimodal large models. But after seeing PP-OCRv6 released by Baidu Wenxin, I've changed my mind. Because it doesn't follow the path of "continuing to pile on parameters..."

X AI KOLs Timeline

Baidu Wenxin releases PP-OCRv6, offering three model tiers: Tiny, Small, and Medium, supporting over 50 languages. The Tiny version is only 1.5MB and can run locally in a browser, with the fastest single-image inference at 97ms, proving that small specialized models can outperform large models on OCR tasks.