corpus-linguistics

#corpus-linguistics

Phonetic and semantic analyses of spoken corpora of Beijing and Taiwan Mandarin indicate that the neutral tone is a lexical tone

arXiv cs.CL ↗ · 4d ago Cached

This paper presents a corpus-based study showing that the neutral tone in Mandarin Chinese is a lexical tone with its own tonal target, based on phonetic and semantic analyses of Beijing and Taiwan Mandarin spoken corpora using generalized additive models and contextualized embeddings.

0 favorites 0 likes

#corpus-linguistics

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper proposes using data from Linguistics Olympiads to create a new corpus for linguistics research, aiming to advance the field.

0 favorites 0 likes

#corpus-linguistics

How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper introduces a register-aware linguistic evaluation framework to assess how human-like large language models (LLMs) are by comparing the distribution of 67 lexico-grammatical features between human and LLM-generated texts using Maximum Mean Discrepancy. Experiments across seven instruction-tuned open-source models and five registers show that no model perfectly matches human baselines, and closeness to human language varies by register rather than model size.

0 favorites 0 likes

corpus-linguistics

Phonetic and semantic analyses of spoken corpora of Beijing and Taiwan Mandarin indicate that the neutral tone is a lexical tone

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

Submit Feedback