Tag
This paper introduces Afrispeech Semantics, a benchmark for evaluating audio language models on semantic reasoning tasks including entailment, consistency, plausibility, accent drift, and accent restraint across diverse domains and accents.
KoALa-Bench introduces a Korean-focused benchmark suite for evaluating large audio language models on six tasks, including novel measures of speech faithfulness and Korea-specific cultural content.