@oliviscusAI: You can now parse any document with one 1.7B parameter model It’s called dots-ocr. One system that handles text, tables…
Summary
The article introduces dots-ocr, a 1.7B parameter model capable of parsing text, tables, formulas, and images from documents in over 100 languages without needing separate OCR pipelines.
View Cached Full Text
Cached at: 05/13/26, 10:18 AM
You can now parse any document with one 1.7B parameter model 🤯
It’s called dots-ocr. One system that handles text, tables, formulas, images, and PDFs across 100+ languages.
No separate OCR pipeline. No task-specific models. https://t.co/KTK8GrZ9hf
Similar Articles
@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…
dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.
Unlimited OCR: One-Shot Long-Horizon Parsing
Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
This paper presents dots.ocr, a unified Vision-Language Model that jointly learns layout detection, text recognition, and relational understanding for multilingual document layout parsing. It achieves state-of-the-art results on OmniDocBench and introduces the XDocParse benchmark spanning 126 languages.
baidu/Unlimited-OCR
Baidu releases Unlimited-OCR, a new model for one-shot long-horizon document parsing, building on Deepseek-OCR. It supports single image and multi-page/PDF parsing via Hugging Face Transformers and SGLang.
@BaiduAI_News: We’re open-sourcing Unlimited OCR — built to read long documents in one pass. With 3B total parameters and only 500M ac…
Baidu open-sources Unlimited OCR, a 3B parameter model (500M activated) that reads long documents in a single pass using Reference Sliding Window Attention (R-SWA), achieving state-of-the-art results on OmniDocBench.