sinhala

Tag

Cards List
#sinhala

Cross-Temporal Sinhala OCR: Page-Level Adaptation and Diachronic Analysis

arXiv cs.CL · 4d ago Cached

This paper introduces sinhala-ocr-lk-acts-1010, the first publicly available real-world page-level dataset for Sinhala OCR, and fine-tunes three vision language models (DeepSeek-OCR V1, DeepSeek-OCR V2, LightOnOCR-2-1B) using QLoRA. LightOnOCR-2-1B achieves a CER of 1.05%, outperforming both open-source and commercial OCR models, and maintains consistent performance across degraded documents from different time periods.

0 favorites 0 likes
← Back to home

Submit Feedback