Unlimited OCR Works

Hugging Face Daily Papers Papers

Summary

Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption in long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.

Recently, end-to-end OCR models, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing a large language model (LLM) as the decoder allows the model to leverage the prior distribution of language, leading to improved OCR performance. However, the downside is equally evident: as the output sequence lengthens, the accumulated KV cache drives up memory consumption and progressively slows down generation. This stands in stark contrast to humans, who exhibit no such decline in efficiency during long-horizon copying tasks. In this technical report, we propose Unlimited OCR, a model designed to emulate human parsing working memory. Taking DeepSeek OCR as the baseline, we replace all attention layers in the decoder with our proposed Reference Sliding Window Attention (R-SWA), which reduces attention computation costs while maintaining a constant KV cache throughout the entire decoding process. By combining the high compression rate of DeepSeek OCR's encoder with our constant KV cache design, Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. More importantly, R-SWA is a general-purpose parsing attention mechanism - beyond OCR, it is equally applicable to tasks such as ASR, translation, etc. Codes and model weights are publicly available at http://github.com/baidu/Unlimited-OCR.
Original Article
View Cached Full Text

Cached at: 06/23/26, 05:40 AM

Paper page - Unlimited OCR Works

Source: https://huggingface.co/papers/2606.23050 Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption during long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.

Recently,end-to-end OCRmodels, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing alarge language model(LLM) as thedecoderallows the model to leverage the prior distribution of language, leading to improved OCR performance. However, the downside is equally evident: as the outputsequence lengthens, the accumulatedKV cachedrives upmemory consumptionand progressively slows down generation. This stands in stark contrast to humans, who exhibit no such decline in efficiency during long-horizon copying tasks. In this technical report, we propose Unlimited OCR, a model designed to emulate human parsingworking memory. Taking DeepSeek OCR as the baseline, we replace allattention layersin thedecoderwith our proposedReference Sliding Window Attention(R-SWA), which reduces attention computation costs while maintaining a constantKV cachethroughout the entire decoding process. By combining the high compression rate of DeepSeek OCR’s encoder with our constantKV cachedesign, Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. More importantly, R-SWA is a general-purposeparsing attention mechanism- beyond OCR, it is equally applicable to tasks such asASR,translation, etc. Codes and model weights are publicly available at http://github.com/baidu/Unlimited-OCR.

View arXiv pageView PDFGitHub756Add to collection

Get this paper in your agent:

hf papers read 2606\.23050

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.23050 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.23050 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.23050 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Unlimited OCR: One-Shot Long-Horizon Parsing

Hacker News Top

Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.

@GoSailGlobal: Current OCR processes multi-page documents page by page. Every time you turn a page, memory is reset. Today, Baidu quietly open-sourced a model on GitHub and HuggingFace called Unlimited OCR, inspired by how humans copy books: - When copying a book, you don't reread hundreds of pages every time you write a word...

X AI KOLs Timeline

Baidu has open-sourced the Unlimited OCR model, which uses a Reference Sliding Window Attention (R-SWA) mechanism to parse documents up to 32K context in a single pass, eliminating the need for page-by-page inference.

baidu/Unlimited-OCR

Hugging Face Models Trending

Baidu releases Unlimited-OCR, a new model for one-shot long-horizon document parsing, building on Deepseek-OCR. It supports single image and multi-page/PDF parsing via Hugging Face Transformers and SGLang.