Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation

arXiv cs.CL Papers

Summary

The paper proposes a novel framework (CDDTLDA) using transfer learning and data augmentation to improve Chinese dialects discrimination under low-resource conditions, achieving state-of-the-art results on two benchmark corpora.

arXiv:2606.18597v1 Announce Type: new Abstract: Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in order to overcome the shortage of resources. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source-side automatic speech recognition (ASR) model. Then, we adopt a simple but effective data augmentation method (i.e., speed, pitch, and noise disturbance) to augment the target-side low-resource Chinese dialects, and fine-tune another target ASR model based on the previous source-side ASR model. Meanwhile, the potential common semantic features between source-side and target-side ASR models can be captured by using self-attention mechanism. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination. Our extensive experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two benchmark Chinese dialects corpora.
Original Article
View Cached Full Text

Cached at: 06/18/26, 05:45 AM

# Low-resource Language Discrimination Towards Chinese Dialects with Transfer learning and Data Augmentation
Source: [https://arxiv.org/abs/2606.18597](https://arxiv.org/abs/2606.18597)
[View PDF](https://arxiv.org/pdf/2606.18597)

> Abstract:Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource\. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation \(CDDTLDA\) in order to overcome the shortage of resources\. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source\-side automatic speech recognition \(ASR\) model\. Then, we adopt a simple but effective data augmentation method \(i\.e\., speed, pitch, and noise disturbance\) to augment the target\-side low\-resource Chinese dialects, and fine\-tune another target ASR model based on the previous source\-side ASR model\. Meanwhile, the potential common semantic features between source\-side and target\-side ASR models can be captured by using self\-attention mechanism\. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination\. Our extensive experimental results demonstrate that our model significantly outperforms state\-of\-the\-art methods on two benchmark Chinese dialects corpora\.

## Submission history

From: Fan Xu \[[view email](https://arxiv.org/show-email/21108fd4/2606.18597)\] **\[v1\]**Wed, 17 Jun 2026 01:46:41 UTC \(993 KB\)

Similar Articles

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Hugging Face Daily Papers

This paper introduces CLD, a lightweight convex optimization-based language detection head for ASR that achieves 97-98% accuracy with under 100 training samples while reducing compute costs by 13x, addressing accent and dialect robustness across 5 languages and 24 sub-dialects.

Dolphin-CN-Dialect: Where Chinese Dialects Matter

arXiv cs.CL

Dolphin-CN-Dialect is a streaming-capable ASR model that improves dialect recognition through temperature-based sampling and redesigned tokenization, achieving competitive performance with a smaller model size.