Speech-Driven End-to-End Language Discrimination towards Chinese Dialects
Summary
This paper investigates speech-driven features for fine-grained discrimination among Chinese dialects, using an end-to-end model that combines MFCC-based features with word-level embeddings via a CNN, outperforming text-driven methods.
View Cached Full Text
Cached at: 06/18/26, 05:45 AM
# Speech-Driven End-to-End Language Discrimination towards Chinese Dialects Source: [https://arxiv.org/abs/2606.18584](https://arxiv.org/abs/2606.18584) [View PDF](https://arxiv.org/pdf/2606.18584) > Abstract:Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task\. The traditional text\-driven focus leads to poor results\. In this paper, we explore the effectiveness of speech\-driven features towards language discrimination among Chinese dialects\. First, we systematically explore the appropriateness of speech\-driven MFCC features towards CNN\-based language discrimination\. Then, we design an end\-to\-end speech recognition model based on HMM\-DNN to predict Chinese dialect words\. We adopt attention to extract the discriminative words related to different Chinese dialects\. Finally, through a CNN, we combine the word\-level embedding and the MFCC\-based features\. Evaluation of two benchmark Chinese dialect corpora shows the appropriateness and effectiveness of the proposed speech\-driven approach to fine\-grained Chinese dialect discrimination compared to the state\-of\-the\-art methods\. ## Submission history From: Fan Xu \[[view email](https://arxiv.org/show-email/7f279a82/2606.18584)\] **\[v1\]**Wed, 17 Jun 2026 01:23:58 UTC \(1,045 KB\)
Similar Articles
Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation
The paper proposes a novel framework (CDDTLDA) using transfer learning and data augmentation to improve Chinese dialects discrimination under low-resource conditions, achieving state-of-the-art results on two benchmark corpora.
Dolphin-CN-Dialect: Where Chinese Dialects Matter
Dolphin-CN-Dialect is a streaming-capable ASR model that improves dialect recognition through temperature-based sampling and redesigned tokenization, achieving competitive performance with a smaller model size.
Multilingual Detection of Alzheimer's Disease from Speech: A Cross-Linguistic Transfer Learning Approach
This paper proposes a cross-linguistic transfer learning approach for detecting Alzheimer's Disease from speech across multiple languages, achieving F1 scores of 82% and supporting real-time screening applications.
Side-by-side Comparison Amplifies Dialect Bias in Language Models
This research paper finds that language models exhibit increased dialect bias when comparing Standard American English and African-American Vernacular English side-by-side, even after safety fine-tuning. Counterfactual fairness fine-tuning can reduce some biases in isolation but not consistently in contrastive settings.
Exploring the Capability Boundaries of LLMs in Mastering Chinese Chouxiang Language
This paper introduces Mouse, a specialized benchmark for evaluating LLMs on Chinese Chouxiang Language tasks across six NLP domains, revealing that current state-of-the-art models have significant limitations with this subcultural internet language despite performing well on contextual understanding tasks.