Unlimited OCR Works
Summary
Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption in long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.
View Cached Full Text
Cached at: 06/23/26, 05:40 AM
Paper page - Unlimited OCR Works
Source: https://huggingface.co/papers/2606.23050 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption during long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.
Recently,end-to-end OCRmodels, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing alarge language model(LLM) as thedecoderallows the model to leverage the prior distribution of language, leading to improved OCR performance. However, the downside is equally evident: as the outputsequence lengthens, the accumulatedKV cachedrives upmemory consumptionand progressively slows down generation. This stands in stark contrast to humans, who exhibit no such decline in efficiency during long-horizon copying tasks. In this technical report, we propose Unlimited OCR, a model designed to emulate human parsingworking memory. Taking DeepSeek OCR as the baseline, we replace allattention layersin thedecoderwith our proposedReference Sliding Window Attention(R-SWA), which reduces attention computation costs while maintaining a constantKV cachethroughout the entire decoding process. By combining the high compression rate of DeepSeek OCR’s encoder with our constantKV cachedesign, Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. More importantly, R-SWA is a general-purposeparsing attention mechanism- beyond OCR, it is equally applicable to tasks such asASR,translation, etc. Codes and model weights are publicly available at http://github.com/baidu/Unlimited-OCR.
View arXiv pageView PDFGitHub756Add to collection
Get this paper in your agent:
hf papers read 2606\.23050
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.23050 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.23050 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.23050 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@AdinaYakup: Unlimited-OCR New OCR from @PaddlePaddle It can parse hundreds of pages in a single pass while maintaining stable speed…
PaddlePaddle releases Unlimited-OCR, a new OCR model using Reference Sliding Window Attention (R-SWA) to maintain constant KV cache during decoding, achieving 93% on OmniDocBench and a 6% improvement over previous methods.
Unlimited OCR: One-Shot Long-Horizon Parsing
Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.
@GoSailGlobal: Current OCR processes multi-page documents page by page. Every time you turn a page, memory is reset. Today, Baidu quietly open-sourced a model on GitHub and HuggingFace called Unlimited OCR, inspired by how humans copy books: - When copying a book, you don't reread hundreds of pages every time you write a word...
Baidu has open-sourced the Unlimited OCR model, which uses a Reference Sliding Window Attention (R-SWA) mechanism to parse documents up to 32K context in a single pass, eliminating the need for page-by-page inference.
@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…
Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.
baidu/Unlimited-OCR
Baidu releases Unlimited-OCR, a new model for one-shot long-horizon document parsing, building on Deepseek-OCR. It supports single image and multi-page/PDF parsing via Hugging Face Transformers and SGLang.