Tag
Riazi-8B is an Urdu large language model fine-tuned for mathematical reasoning, achieving improved performance on MGSM-Urdu through continued pre-training and supervised fine-tuning on Urdu Chain-of-Thought data.
UrduMMLU is a new benchmark of 26,431 multiple-choice questions across 26 subjects for evaluating LLMs on Urdu language understanding, sourced from native educational materials. Evaluation of 30 LLMs reveals Gemini-3.5-Flash performs best, while open-source models and region-specific subjects pose significant challenges.