@DailyDoseOfDS_: Fine-tune DeepSeek-OCR on your own language! (100% local) Most vision models treat documents as massive sequences of to…

X AI KOLs Timeline 06/08/26, 09:30 AM Models

deepseek-ocr fine-tuning ocr vision-model open-source document-processing persian-ocr

Summary

DeepSeek-OCR is a 3B vision model using context optical compression for efficient document processing. Fine-tuning it on Persian text using Unsloth achieved an 88.26% improvement in character error rate, all open-source and runnable on a single GPU.

Fine-tune DeepSeek-OCR on your own language! (100% local) Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. It is a 3B-parameter vision model that achieves 97% precision while using 10x fewer vision tokens than text-based LLMs. In fact, you can easily fine-tune it for your specific use case on a single GPU. We used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. ↳ Base model: 149% character error rate (CER) ↳ Fine-tuned model: 60% CER (57% more accurate) ↳ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. We've shared the complete guide in the next tweet, which includes the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!

Original Article

View Cached Full Text

Cached at: 06/08/26, 03:26 PM

Fine-tune DeepSeek-OCR on your own language!

(100% local)

Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow.

DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents.

It is a 3B-parameter vision model that achieves 97% precision while using 10x fewer vision tokens than text-based LLMs.

In fact, you can easily fine-tune it for your specific use case on a single GPU.

We used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate.

↳ Base model: 149% character error rate (CER) ↳ Fine-tuned model: 60% CER (57% more accurate) ↳ Training time: 60 steps on a single GPU

Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you’re working with.

We’ve shared the complete guide in the next tweet, which includes the code, notebooks, and environment setup ready to run with a single click.

Everything is 100% open-source!

Tech Stack:

@UnslothAI to run and fine-tune the model
@LightningAI environments for hosting and deployment

Find the code and environment setup here:

@DailyDoseOfDS_: Fine-tune DeepSeek-OCR on your own language! (100% local) Most vision models treat documents as massive sequences of to…

Similar Articles

@Saboo_Shubham_: OPEN SOURCE AI is killing it. DeepSeek v4 Flash is a quasi-frontier model with a massive 1M context window. It can LOCA…

@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Building a Fast Multilingual OCR Model with Synthetic Data

I have (even faster) DeepSeek V4 Pro at home

Submit Feedback

Similar Articles

@Saboo_Shubham_: OPEN SOURCE AI is killing it. DeepSeek v4 Flash is a quasi-frontier model with a massive 1M context window. It can LOCA…

@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Building a Fast Multilingual OCR Model with Synthetic Data

I have (even faster) DeepSeek V4 Pro at home