Tag
This paper benchmarks ten OCR systems on Devanagari script under synthetic degradation and real scans, finding that synthetic renders overstate quality, specialized OCR-VLMs are fragile, and strong English OCR does not predict Indic OCR performance. It releases a benchmark, code, and models.
Presents a systematic methodology for converting Hindi WordNet into 1.25 million instruction-response pairs to fine-tune a 12B-parameter language model using LoRA, demonstrating improved pedagogical effectiveness for specialized conversational systems in low-resource languages.
Amazon is testing a Hindi-language version of its generative AI assistant Alexa+ in India, inviting users to join a beta program to refine the experience before a wider launch.