Unlimited OCR Works

Hugging Face Daily Papers 06/22/26, 12:00 AM Papers

ocr reference-sliding-window-attention kv-cache efficiency open-source baidu

Summary

Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption in long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.

Recently, end-to-end OCR models, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing a large language model (LLM) as the decoder allows the model to leverage the prior distribution of language, leading to improved OCR performance. However, the downside is equally evident: as the output sequence lengthens, the accumulated KV cache drives up memory consumption and progressively slows down generation. This stands in stark contrast to humans, who exhibit no such decline in efficiency during long-horizon copying tasks. In this technical report, we propose Unlimited OCR, a model designed to emulate human parsing working memory. Taking DeepSeek OCR as the baseline, we replace all attention layers in the decoder with our proposed Reference Sliding Window Attention (R-SWA), which reduces attention computation costs while maintaining a constant KV cache throughout the entire decoding process. By combining the high compression rate of DeepSeek OCR's encoder with our constant KV cache design, Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. More importantly, R-SWA is a general-purpose parsing attention mechanism - beyond OCR, it is equally applicable to tasks such as ASR, translation, etc. Codes and model weights are publicly available at http://github.com/baidu/Unlimited-OCR.

Original Article

View Cached Full Text

Cached at: 06/23/26, 05:40 AM

Paper page - Unlimited OCR Works

Source: https://huggingface.co/papers/2606.23050 Authors:

Abstract

Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption during long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.

Recently,end-to-end OCRmodels, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing alarge language model(LLM) as thedecoderallows the model to leverage the prior distribution of language, leading to improved OCR performance. However, the downside is equally evident: as the outputsequence lengthens, the accumulatedKV cachedrives upmemory consumptionand progressively slows down generation. This stands in stark contrast to humans, who exhibit no such decline in efficiency during long-horizon copying tasks. In this technical report, we propose Unlimited OCR, a model designed to emulate human parsingworking memory. Taking DeepSeek OCR as the baseline, we replace allattention layersin thedecoderwith our proposedReference Sliding Window Attention(R-SWA), which reduces attention computation costs while maintaining a constantKV cachethroughout the entire decoding process. By combining the high compression rate of DeepSeek OCR’s encoder with our constantKV cachedesign, Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. More importantly, R-SWA is a general-purposeparsing attention mechanism- beyond OCR, it is equally applicable to tasks such asASR,translation, etc. Codes and model weights are publicly available at http://github.com/baidu/Unlimited-OCR.

View arXiv page View PDF GitHub756 Add to collection

Get this paper in your agent:

hf papers read 2606\.23050

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.23050 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.23050 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.23050 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Unlimited OCR Works

Paper page - Unlimited OCR Works

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

@AdinaYakup: Unlimited-OCR New OCR from @PaddlePaddle It can parse hundreds of pages in a single pass while maintaining stable speed…

Unlimited OCR: One-Shot Long-Horizon Parsing

@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…

baidu/Unlimited-OCR

Submit Feedback

Similar Articles

@AdinaYakup: Unlimited-OCR New OCR from @PaddlePaddle It can parse hundreds of pages in a single pass while maintaining stable speed…

Unlimited OCR: One-Shot Long-Horizon Parsing

@GoSailGlobal: Current OCR processes multi-page documents page by page. Every time you turn a page, memory is reset. Today, Baidu quietly open-sourced a model on GitHub and HuggingFace called Unlimited OCR, inspired by how humans copy books: - When copying a book, you don't reread hundreds of pages every time you write a word...

@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…