PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Hugging Face Daily Papers 06/02/26, 12:00 AM Papers

Summary

PaddleOCR-VL-1.6 improves document parsing by identifying and refining under-optimized regions via targeted data optimization and progressive post-training, achieving state-of-the-art 96.33% on OmniDocBench v1.6.

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.

Original Article

View Cached Full Text

Cached at: 06/03/26, 07:36 AM

Paper page - PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Source: https://huggingface.co/papers/2606.03264 Authors:

Abstract

PaddleOCR-VL-1.6 enhances document parsing performance through targeted data optimization and progressive post-training techniques, achieving state-of-the-art results on OmniDocBench v1.6.

We introduce PaddleOCR-VL-1.6, an upgraded compactdocument parsingmodel built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-awaredata optimizationframework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressivepost-trainingrecipe based on curated data selection andreinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% onOmniDocBenchv1.6, demonstrates strong competitiveness against top-tierVLMs, and provides a practicalpost-trainingrecipe for the PaddleOCR-VL series.

View arXiv page View PDF Project page GitHub79.4k Add to collection

Get this paper in your agent:

hf papers read 2606\.03264

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper2

#### PaddlePaddle/PaddleOCR-VL-1.6 Image-Text-to-Text• 1.0B• Updatedabout 1 hour ago • 4k • 196 #### PaddlePaddle/PaddleOCR-VL-1.6-GGUF 0.5B• Updatedabout 1 hour ago • 2.05k • 9

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.03264 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Paper page - PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Abstract

Models citing this paper2

Datasets citing this paper0

Spaces citing this paper1

Collections including this paper0

Similar Articles

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

PaddlePaddle/PaddleOCR

@AdinaYakup: PP-OCRv6 just released by Baidu @PaddlePaddle tiny 1.5M / small 7.7M / medium 34.5M 48+ languages Supports handwritten/…

🚀PP-OCRv6 is officially released !

Submit Feedback

Similar Articles

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

@AdinaYakup: PP-OCRv6 just released by Baidu @PaddlePaddle tiny 1.5M / small 7.7M / medium 34.5M 48+ languages Supports handwritten/…

🚀PP-OCRv6 is officially released !