@vanstriendaniel: It's raining OCR models again! @Baidu_Inc's Unlimited-OCR is one of the more interesting. You can try it without much e…

X AI KOLs Following 06/23/26, 12:53 PM Models

ocr baidu unlimited-ocr huggingface hf-jobs sglang gpu-serving

Summary

This post shows how to serve Baidu's Unlimited-OCR model as a temporary, OpenAI-compatible endpoint on Hugging Face Jobs, enabling multi-page document parsing with features like table-to-HTML and equation-to-LaTeX extraction.

It's raining OCR models again! @Baidu_Inc's Unlimited-OCR is one of the more interesting. You can try it without much effort via a throwaway GPU endpoint on @huggingface Jobs (which recently added port forwarding support) with one command It's OpenAI-compatible, your HF token is the API key, and --timeout makes it self-destruct so you can't leave a GPU running by accident Once it's warm, it's quick and @sgl_project batches concurrent requests, so an agent can boot the model, fire a big async batch at it (say, a whole bucket of newspaper scans), then cancel it. I pointed it at the front page of a 1901 newspaper, "The Commoner" + 6 PDF pages in a single request: tables came back as HTML, equations as LaTeX, figures with captions, reading order preserved across pages. Docs here: https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#1-start-the-server…

Original Article

View Cached Full Text

Cached at: 06/23/26, 03:52 PM

It’s raining OCR models again!

@Baidu_Inc’s Unlimited-OCR is one of the more interesting. You can try it without much effort via a throwaway GPU endpoint on @huggingface Jobs (which recently added port forwarding support) with one command

It’s OpenAI-compatible, your HF token is the API key, and –timeout makes it self-destruct so you can’t leave a GPU running by accident

Once it’s warm, it’s quick and @sgl_project batches concurrent requests, so an agent can boot the model, fire a big async batch at it (say, a whole bucket of newspaper scans), then cancel it.

I pointed it at the front page of a 1901 newspaper, “The Commoner” + 6 PDF pages in a single request: tables came back as HTML, equations as LaTeX, figures with captions, reading order preserved across pages.

Docs here: https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#1-start-the-server…

serving-unlimited-ocr.md · uv-scripts/ocr at main

Source: https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md davanstrien’s picture

Sync from GitHub via hub-sync

5fd3fbe

verified

about 3 hours ago

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#serve-unlimited-ocr-as-a-live-endpoint-on-hf-jobsServe Unlimited-OCR as a live endpoint on HF Jobs

The OCR recipes in this folder run as batch jobs (dataset in → dataset out). To call a model interactively, from an agent, or with ad-hoc concurrent requests, you can instead run it as a temporary HTTP endpoint.HF Jobs servingexposes a port on a GPU Job, giving an OpenAI-compatible endpoint that runs until the job is cancelled or its\-\-timeoutis reached.

This is a worked example forbaidu/Unlimited-OCR(3B, MIT, based on DeepSeek-OCR; supports multi-page parsing in a single request). The model ships its own SGLang build, so it runs on the stocklmsysorg/sglangimage with the 12 MB wheel installed at startup; no custom image is required.

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#1-start-the-server1. Start the server

hf jobs run --detach --expose 10000 --flavor h200 -s HF_TOKEN --timeout 30m \
  lmsysorg/sglang:latest -- \
  bash -lc 'pip install --no-deps https://github.com/baidu/Unlimited-OCR/raw/main/wheel/sglang-0.0.0.dev11416+g92e8bb79e-py3-none-any.whl \
    && pip install -q kernels==0.11.7 \
    && python -m sglang.launch_server --model baidu/Unlimited-OCR --served-model-name Unlimited-OCR \
       --attention-backend fa3 --page-size 1 --mem-fraction-static 0.8 --context-length 32768 \
       --enable-custom-logit-processor --disable-overlap-schedule --skip-server-warmup \
       --host 0.0.0.0 --port 10000'

Notes:

\-\-beforebashis required, or the CLI parses\-lcas its own flags.
\-\-timeoutstops the endpoint (and billing) at the deadline;hf jobs cancel <id\>stops it earlier.
fa3requires a Hopper GPU (e.g.h200). The model is small, so the attention backend, not GPU memory, determines the flavor. Runhf jobs hardwarefor available flavors.
Follow startup withhf jobs logs \-f <id\>; the server is ready atApplication startup complete(about 3 minutes from a cold start).

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#2-call-it-openai-client-hf-token-as-the-api-key2. Call it (OpenAI client; HF token as the API key)

The exposed port is athttps://<job\_id\>\-\-10000\.hf\.jobs; the OpenAI base URL is that plus/v1.

import base64, os
from openai import OpenAI

client = OpenAI(base_url="https://<job_id>--10000.hf.jobs/v1", api_key=os.environ["HF_TOKEN"])
img = base64.b64encode(open("page.jpg", "rb").read()).decode()

r = client.chat.completions.create(
    model="Unlimited-OCR",
    messages=[{"role": "user", "content": [
        {"type": "text", "text": "document parsing."},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img}"}},
    ]}],
    temperature=0,
    extra_body={"images_config": {"image_mode": "gundam"}},  # "gundam" (crop-tiling) or "base"
)
print(r.choices[0].message.content)

Output is layout-grounded markdown: each block is tagged<\|det\|\>type \[x1,y1,x2,y2\]<\|/det\|\> text, with coordinates normalized to 0–1000. Remove the tags for plain text (re\.sub\(r'<\\\|det\\\|\>\.\*?<\\\|/det\\\|\>', '', text\)) or keep them for structure.

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#3-multi-page–pdf3. Multi-page / PDF

Send multiple page images in one request with theMulti page parsing\.prompt andimage\_mode="base":

parts = [{"type": "text", "text": "Multi page parsing."}]
for page_png in page_images:            # e.g. PDF pages rendered with pymupdf at ~150 dpi
    b64 = base64.b64encode(open(page_png, "rb").read()).decode()
    parts.append({"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}})

r = client.chat.completions.create(
    model="Unlimited-OCR",
    messages=[{"role": "user", "content": parts}],
    temperature=0, max_tokens=16384,
    extra_body={"images_config": {"image_mode": "base"}},
)

Pages are separated by<PAGE\>; tables are returned as HTML and equations as LaTeX, with reading order preserved across pages. The context length is 32k tokens, so split longer documents.

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#4-concurrency4. Concurrency

SGLang batches concurrent requests, so a client can send many requests in parallel to one endpoint; the upstreaminfer\.pyuses aThreadPoolExecutoratconcurrency=8. For a large corpus, a batch job that runs next to the data (resumable, no network transfer) is usually a better fit than a client-to-endpoint loop.

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#5-stop-it5. Stop it

hf jobs cancel <job_id>

Billing is per-minute for the GPU flavor plus a small flat fee for the exposed port; scheduling time is not billed. Runhf jobs hardwarefor current flavors and prices.

@vanstriendaniel: It's raining OCR models again! @Baidu_Inc's Unlimited-OCR is one of the more interesting. You can try it without much e…

serving-unlimited-ocr.md · uv-scripts/ocr at main

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#serve-unlimited-ocr-as-a-live-endpoint-on-hf-jobsServe Unlimited-OCR as a live endpoint on HF Jobs

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#1-start-the-server1. Start the server

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#2-call-it-openai-client-hf-token-as-the-api-key2. Call it (OpenAI client; HF token as the API key)

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#3-multi-page–pdf3. Multi-page / PDF

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#4-concurrency4. Concurrency

https://huggingface.co/datasets/uv-scripts/ocr/blob/main/serving-unlimited-ocr.md#5-stop-it5. Stop it

Similar Articles

baidu/Unlimited-OCR

Unlimited OCR: One-Shot Long-Horizon Parsing

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

Submit Feedback

Similar Articles

Unlimited OCR: One-Shot Long-Horizon Parsing

@GoSailGlobal: Current OCR processes multi-page documents page by page. Every time you turn a page, memory is reset. Today, Baidu quietly open-sourced a model on GitHub and HuggingFace called Unlimited OCR, inspired by how humans copy books: - When copying a book, you don't reread hundreds of pages every time you write a word...

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

@berryxia: Wow, this move directly poached DeepSeek's talent! Last night I saw this interesting OCR open-source model on HuggingFace and the fascinating story behind it. This OCR model is completely different from traditional ones! Its speed and accuracy are absolutely unbeatable~~ Let me start with some background, for those who are familiar…