Speech-Driven End-to-End Language Discrimination towards Chinese Dialects

arXiv cs.CL 06/18/26, 04:00 AM Papers

speech-driven end-to-end language-discrimination chinese-dialects mfcc cnn hmm-dnn

Summary

This paper investigates speech-driven features for fine-grained discrimination among Chinese dialects, using an end-to-end model that combines MFCC-based features with word-level embeddings via a CNN, outperforming text-driven methods.

arXiv:2606.18584v1 Announce Type: new Abstract: Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task. The traditional text-driven focus leads to poor results. In this paper, we explore the effectiveness of speech-driven features towards language discrimination among Chinese dialects. First, we systematically explore the appropriateness of speech-driven MFCC features towards CNN-based language discrimination. Then, we design an end-to-end speech recognition model based on HMM-DNN to predict Chinese dialect words. We adopt attention to extract the discriminative words related to different Chinese dialects. Finally, through a CNN, we combine the word-level embedding and the MFCC-based features. Evaluation of two benchmark Chinese dialect corpora shows the appropriateness and effectiveness of the proposed speech-driven approach to fine-grained Chinese dialect discrimination compared to the state-of-the-art methods.

Original Article

View Cached Full Text

Cached at: 06/18/26, 05:45 AM

# Speech-Driven End-to-End Language Discrimination towards Chinese Dialects
Source: [https://arxiv.org/abs/2606.18584](https://arxiv.org/abs/2606.18584)
[View PDF](https://arxiv.org/pdf/2606.18584)

> Abstract:Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task\. The traditional text\-driven focus leads to poor results\. In this paper, we explore the effectiveness of speech\-driven features towards language discrimination among Chinese dialects\. First, we systematically explore the appropriateness of speech\-driven MFCC features towards CNN\-based language discrimination\. Then, we design an end\-to\-end speech recognition model based on HMM\-DNN to predict Chinese dialect words\. We adopt attention to extract the discriminative words related to different Chinese dialects\. Finally, through a CNN, we combine the word\-level embedding and the MFCC\-based features\. Evaluation of two benchmark Chinese dialect corpora shows the appropriateness and effectiveness of the proposed speech\-driven approach to fine\-grained Chinese dialect discrimination compared to the state\-of\-the\-art methods\.

## Submission history

From: Fan Xu \[[view email](https://arxiv.org/show-email/7f279a82/2606.18584)\] **\[v1\]**Wed, 17 Jun 2026 01:23:58 UTC \(1,045 KB\)

Speech-Driven End-to-End Language Discrimination towards Chinese Dialects

Similar Articles

Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation

Dolphin-CN-Dialect: Where Chinese Dialects Matter

Multilingual Detection of Alzheimer's Disease from Speech: A Cross-Linguistic Transfer Learning Approach

Side-by-side Comparison Amplifies Dialect Bias in Language Models

Exploring the Capability Boundaries of LLMs in Mastering Chinese Chouxiang Language

Submit Feedback

Similar Articles

Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation

Dolphin-CN-Dialect: Where Chinese Dialects Matter

Multilingual Detection of Alzheimer's Disease from Speech: A Cross-Linguistic Transfer Learning Approach

Side-by-side Comparison Amplifies Dialect Bias in Language Models

Exploring the Capability Boundaries of LLMs in Mastering Chinese Chouxiang Language