PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Hugging Face Daily Papers Papers

Summary

PaddleOCR-VL-1.6 improves document parsing by identifying and refining under-optimized regions via targeted data optimization and progressive post-training, achieving state-of-the-art 96.33% on OmniDocBench v1.6.

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.
Original Article
View Cached Full Text

Cached at: 06/03/26, 07:36 AM

Paper page - PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

Source: https://huggingface.co/papers/2606.03264 Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

PaddleOCR-VL-1.6 enhances document parsing performance through targeted data optimization and progressive post-training techniques, achieving state-of-the-art results on OmniDocBench v1.6.

We introduce PaddleOCR-VL-1.6, an upgraded compactdocument parsingmodel built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-awaredata optimizationframework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressivepost-trainingrecipe based on curated data selection andreinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% onOmniDocBenchv1.6, demonstrates strong competitiveness against top-tierVLMs, and provides a practicalpost-trainingrecipe for the PaddleOCR-VL series.

View arXiv pageView PDFProject pageGitHub79.4kAdd to collection

Get this paper in your agent:

hf papers read 2606\.03264

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper2

#### PaddlePaddle/PaddleOCR-VL-1.6 Image-Text-to-Text• 1.0B• Updatedabout 1 hour ago • 4k • 196 #### PaddlePaddle/PaddleOCR-VL-1.6-GGUF 0.5B• Updatedabout 1 hour ago • 2.05k • 9

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.03264 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

PaddlePaddle/PaddleOCR

GitHub Trending (daily)

PaddleOCR is a powerful, lightweight OCR toolkit that converts PDFs and images into structured data for AI applications, supporting 100+ languages and designed to bridge documents with LLMs.

🚀PP-OCRv6 is officially released !

Reddit r/LocalLLaMA

PaddleOCR releases PP-OCRv6, a new OCR model series with sizes from 1.5M to 34.5M parameters, offering improved accuracy and faster inference, supporting 50 languages and new scenarios like PCB and CAD drawings, under Apache 2.0 open source license.