@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…
Summary
dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.
Similar Articles
@PaddlePaddle: PP-OCRv6 Tech Deep Dive Ep.1: In the Era of Large Models, Why Does Lightweight OCR Still Have Irreplaceable Value? — PP…
PP-OCRv6 is a lightweight OCR model (34.5M parameters) that challenges large VLMs with its MetaFormer architecture, offering efficient text detection and recognition across multiple deployment scenarios.
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
This paper presents dots.ocr, a unified Vision-Language Model that jointly learns layout detection, text recognition, and relational understanding for multilingual document layout parsing. It achieves state-of-the-art results on OmniDocBench and introduces the XDocParse benchmark spanning 126 languages.
@oliviscusAI: You can now parse any document with one 1.7B parameter model It’s called dots-ocr. One system that handles text, tables…
The article introduces dots-ocr, a 1.7B parameter model capable of parsing text, tables, formulas, and images from documents in over 100 languages without needing separate OCR pipelines.
@rionaifantasy: Unbelievable! How Can a 34.5M Parameter OCR Beat a 235B Large Model? Let me tell you something ridiculous: I used to believe the future of OCR would inevitably be devoured by ever-larger multimodal large models. But after seeing PP-OCRv6 released by Baidu Wenxin, I've changed my mind. Because it doesn't follow the path of "continuing to pile on parameters..."
Baidu Wenxin releases PP-OCRv6, offering three model tiers: Tiny, Small, and Medium, supporting over 50 languages. The Tiny version is only 1.5MB and can run locally in a browser, with the fastest single-image inference at 97ms, proving that small specialized models can outperform large models on OCR tasks.
We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]
A comprehensive benchmark of 18 LLMs on OCR tasks (7k+ calls) reveals that cheaper and older models often match premium accuracy at a fraction of the cost, with full dataset and framework open-sourced.