Tag
This paper introduces a cross-evaluation framework for benchmarking LLMs on Arabic cultural and sociolinguistic knowledge, using human SME ground truth and automated judges. The authors contribute a dataset of prompt-rubric pairs for Egyptian and Iraqi Arabic, evaluating frontier LLMs and finding that cultural reasoning remains a primary failure mode for automated grading.
A thesis investigating geographic dialect alignment in place-based social media communities in New Zealand, examining how Reddit communities reflect patterns of language variation similar to geographic dialect communities through analysis of lexical, morphosyntactic, and semantic features.