human-sme

Tag

Cards List
#human-sme

Benchmarking Frontier LLMs on Arabic Cultural and Sociolinguistic Knowledge: A Cross-Evaluation Framework with Human SME Ground Truth

arXiv cs.CL · yesterday Cached

This paper introduces a cross-evaluation framework for benchmarking LLMs on Arabic cultural and sociolinguistic knowledge, using human SME ground truth and automated judges. The authors contribute a dataset of prompt-rubric pairs for Egyptian and Iraqi Arabic, evaluating frontier LLMs and finding that cultural reasoning remains a primary failure mode for automated grading.

0 favorites 0 likes
← Back to home

Submit Feedback