PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training
Summary
PaddleOCR-VL-1.6 improves document parsing by identifying and refining under-optimized regions via targeted data optimization and progressive post-training, achieving state-of-the-art 96.33% on OmniDocBench v1.6.
View Cached Full Text
Cached at: 06/03/26, 07:36 AM
Paper page - PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training
Source: https://huggingface.co/papers/2606.03264 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
PaddleOCR-VL-1.6 enhances document parsing performance through targeted data optimization and progressive post-training techniques, achieving state-of-the-art results on OmniDocBench v1.6.
We introduce PaddleOCR-VL-1.6, an upgraded compactdocument parsingmodel built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-awaredata optimizationframework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressivepost-trainingrecipe based on curated data selection andreinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% onOmniDocBenchv1.6, demonstrates strong competitiveness against top-tierVLMs, and provides a practicalpost-trainingrecipe for the PaddleOCR-VL series.
View arXiv pageView PDFProject pageGitHub79.4kAdd to collection
Get this paper in your agent:
hf papers read 2606\.03264
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper2
#### PaddlePaddle/PaddleOCR-VL-1.6 Image-Text-to-Text• 1.0B• Updatedabout 1 hour ago • 4k • 196
#### PaddlePaddle/PaddleOCR-VL-1.6-GGUF 0.5B• Updatedabout 1 hour ago • 2.05k • 9
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.03264 in a dataset README.md to link it from this page.
Spaces citing this paper1
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
PaddleOCR-VL is a compact 0.9B vision-language model that achieves state-of-the-art performance in multilingual document parsing and element recognition by integrating NaViT-style dynamic resolution with the ERNIE language model.
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
PaddleOCR 3.5 adds a Transformers inference backend, enabling OCR and document parsing models like PP-OCRv5 and PaddleOCR-VL 1.5 to run seamlessly within the Hugging Face ecosystem.
PaddlePaddle/PaddleOCR
PaddleOCR is a powerful, lightweight OCR toolkit that converts PDFs and images into structured data for AI applications, supporting 100+ languages and designed to bridge documents with LLMs.
@AdinaYakup: PP-OCRv6 just released by Baidu @PaddlePaddle tiny 1.5M / small 7.7M / medium 34.5M 48+ languages Supports handwritten/…
Baidu's PaddlePaddle released PP-OCRv6, an OCR model supporting 48+ languages with tiny (1.5M), small (7.7M), and medium (34.5M) sizes, optimized for edge deployment and handwritten/printed/industrial/screen/card text.
🚀PP-OCRv6 is officially released !
PaddleOCR releases PP-OCRv6, a new OCR model series with sizes from 1.5M to 34.5M parameters, offering improved accuracy and faster inference, supporting 50 languages and new scenarios like PCB and CAD drawings, under Apache 2.0 open source license.