Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Hugging Face Daily Papers 05/22/26, 12:00 AM Papers

speech-recognition low-resource accent-robust convex-optimization language-detection asr

Summary

This paper introduces CLD, a lightweight convex optimization-based language detection head for ASR that achieves 97-98% accuracy with under 100 training samples while reducing compute costs by 13x, addressing accent and dialect robustness across 5 languages and 24 sub-dialects.

Globalization and multiculturalism continue to produce increasingly diverse speech varieties. Yet current spoken dialogue systems frequently fail on under-represented dialects and accents, often misidentifying the input language and causing cascading failures in downstream dialogue tasks. Addressing this dialectal variance under low-resource constraints remains an open challenge, as standard fine-tuning is computationally expensive and prone to overfitting on high-dimensional speech data. We propose Convex Language Detection (CLD), a novel framework that integrates theoretically grounded convex optimization techniques into the spoken dialogue systems pipeline. Our method is efficiently implemented via multi-GPU Alternating Direction Method of Multipliers (ADMM) in JAX, thus providing global optimality guarantees and fast training in polynomial time. Theoretically, we prove that our convex objective induces certified margin stability and provide guarantees against feature perturbations. Empirically, we demonstrate sample efficiency and robustness to input dialectical variation, achieving 97-98% accuracy in challenging low-resource regimes. Our open-source package is available at https://pypi.org/project/jaxcld/

Original Article

View Cached Full Text

Cached at: 05/29/26, 11:04 PM

Paper page - Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Source: https://huggingface.co/papers/2605.23235 🎵 Meet Convex Language Detection (CLD)!

Automatic Speech Recognition (ASR) frequently exhibits failures on accents and dialects. But collecting more data to retrain a larger model is slow and expensive. CLD solves this—not by grid-searching hyperparameters or collecting massive datasets, but through the elegant geometry of convex optimization.

🌐🎙️ Instead of relying on unpredictable large-scale neural networks that struggle with accent variance, CLD introduces a lightweight, pluggable detection head that yields mathematically certified margin stability.

We benchmarked CLD across 5 languages, 24 unique sub-dialects (including highly challenging regimes like Singaporean English and regional Mandarin), and foundational models like Whisper and MMS-1B. The results: Even with under 100 training samples, CLD locks in 97–98% accuracy, reduces cross-lingual decoding failures, and cuts compute costs by a massive 13x.

The structural shift is fundamentally distinct: Current multilingual ASR models are heavily imbalanced toward standard, high-resource speech datasets, leaving millions of global speakers facing cascading errors. By recasting language identification as a convex program solved via parallelized ADMM in JAX, we don’t just guess a boundary—we calculate a verifiable radius of label invariance with guarantees. We see this as a highly scalable, theoretically backed plug-and-play module which aims to bring equity, speed, and reliability to global speech systems.

🛠️ Open-Source Code:https://github.com/pilancilab/CLD 📦 JAX Package: pip install jaxcld (https://pypi.org/project/jaxcld/) 📄 Full Paper:https://arxiv.org/abs/2605.23235

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Paper page - Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Similar Articles

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

Linear Semantic Segmentation for Low-Resource Spoken Dialects

CRoCoDiL: Continuous and Robust Conditioned Diffusion for Language

LaSR: Context-Aware Speech Recognition via Latent Reasoning

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

Submit Feedback

Similar Articles

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

Linear Semantic Segmentation for Low-Resource Spoken Dialects

CRoCoDiL: Continuous and Robust Conditioned Diffusion for Language

LaSR: Context-Aware Speech Recognition via Latent Reasoning

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation