Tag
This paper evaluates LLMs for automatically annotating narrative macrostructure in spoken Mandarin, finding that the best model achieves near-human reliability while reducing annotation time by 65%, though performance degrades on semantically complex or lexically diverse narratives.
This paper introduces a benchmark for semantic segmentation in low-resource dialectal Arabic and proposes a model that improves performance on conversational speech compared to standard baselines.