Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation
Summary
The paper proposes a novel framework (CDDTLDA) using transfer learning and data augmentation to improve Chinese dialects discrimination under low-resource conditions, achieving state-of-the-art results on two benchmark corpora.
View Cached Full Text
Cached at: 06/18/26, 05:45 AM
# Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation Source: [https://arxiv.org/abs/2606.18597](https://arxiv.org/abs/2606.18597) [View PDF](https://arxiv.org/pdf/2606.18597) > Abstract:Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource\. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation \(CDDTLDA\) in order to overcome the shortage of resources\. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source\-side automatic speech recognition \(ASR\) model\. Then, we adopt a simple but effective data augmentation method \(i\.e\., speed, pitch, and noise disturbance\) to augment the target\-side low\-resource Chinese dialects, and fine\-tune another target ASR model based on the previous source\-side ASR model\. Meanwhile, the potential common semantic features between source\-side and target\-side ASR models can be captured by using self\-attention mechanism\. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination\. Our extensive experimental results demonstrate that our model significantly outperforms state\-of\-the\-art methods on two benchmark Chinese dialects corpora\. ## Submission history From: Fan Xu \[[view email](https://arxiv.org/show-email/21108fd4/2606.18597)\] **\[v1\]**Wed, 17 Jun 2026 01:46:41 UTC \(993 KB\)
Similar Articles
Speech-Driven End-to-End Language Discrimination towards Chinese Dialects
This paper investigates speech-driven features for fine-grained discrimination among Chinese dialects, using an end-to-end model that combines MFCC-based features with word-level embeddings via a CNN, outperforming text-driven methods.
Convex Low-resource Accent-Robust Language Detection in Speech Recognition
This paper introduces CLD, a lightweight convex optimization-based language detection head for ASR that achieves 97-98% accuracy with under 100 training samples while reducing compute costs by 13x, addressing accent and dialect robustness across 5 languages and 24 sub-dialects.
Multilingual Detection of Alzheimer's Disease from Speech: A Cross-Linguistic Transfer Learning Approach
This paper proposes a cross-linguistic transfer learning approach for detecting Alzheimer's Disease from speech across multiple languages, achieving F1 scores of 82% and supporting real-time screening applications.
Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation
This study evaluates bilingual fine-tuning with language identification tokens for improving ASR in low-resource languages across nine diverse language pairs, finding that high LID accuracy is beneficial and that providing the LID token at inference can boost performance when LID accuracy is low.
Dolphin-CN-Dialect: Where Chinese Dialects Matter
Dolphin-CN-Dialect is a streaming-capable ASR model that improves dialect recognition through temperature-based sampling and redesigned tokenization, achieving competitive performance with a smaller model size.