Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
Summary
Mega-ASR proposes scaling up real-world acoustic simulation to improve automatic speech recognition in challenging, wild conditions, aiming to narrow the performance gap between lab and real-world settings.
View Cached Full Text
Cached at: 05/21/26, 10:10 AM
Paper page - Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
Source: https://huggingface.co/papers/2605.19833 Get this paper in your agent:
hf papers read 2605\.19833
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.19833 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.19833 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.19833 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@AdinaYakup: Mega-ASR https://huggingface.co/zhifeixie/Mega-ASR… 1.7B Apache 2.0 Built for Noise/Reverb/Clipping/Overlapping speaker…
Mega-ASR is a 1.7B parameter robust ASR model under Apache 2.0, designed for noisy, reverberant, and overlapping speech, with an audio quality router to handle clean vs degraded audio.
@XieZhifei14110: Stop using Whisper for ASR ! open sourcing Mega-ASR — the first full-scenario SOTA industrial-grade ASR model, built fo…
Open sourcing Mega-ASR, a full-scenario SOTA industrial-grade ASR model designed for challenging audio conditions like far-field and noise, outperforming existing open and closed models by 10-30% on real-world benchmarks.
@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...
NTU, NUS, and Shanghai AI Lab jointly released Mega-ASR, a fully open-source ASR model built on Qwen3-ASR. Using the Voices-in-the-Wild-2M dataset and progressive acoustic-to-semantic optimization, it achieves up to 30% relative Word Error Rate (WER) reduction in real-world noisy environments. With only 1.7B parameters, it enables efficient inference on consumer-grade hardware.
Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition
Introduces Vividh-ASR, a complexity-tiered benchmark for Hindi and Malayalam ASR, identifies studio-bias in fine-tuning, and proposes R-MFT to improve spontaneous speech performance efficiently.
Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions
This paper evaluates nine ASR models (Whisper, Parakeet, Wav2Vec2) on Dutch child speech datasets JASMIN and DART, finding that fine-tuned Whisper-medium achieves the best performance (WER 5.54% on JASMIN, 70.37% on DART). It also proposes a selection method to automatically identify correctly pronounced utterances with high precision, reducing the need for manual verification.