automatic-speech-recognition

#automatic-speech-recognition

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

arXiv cs.CL ↗ · 5d ago Cached

This paper evaluates demographic and accent biases in phoneme-based ASR systems, specifically WhisperIPA and ZIPA, using phoneme error rate and a new Soft PER metric, revealing persistent disparities across languages and groups.

0 favorites 0 likes

#automatic-speech-recognition

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Hugging Face Blog ↗ · 6d ago Cached

ServiceNow AI releases a benchmark and dataset for evaluating automatic speech recognition (ASR) on code-switched speech across four language pairs (Spanish-English, French-English, Canadian French-English, German-English) in enterprise HR and IT scenarios, finding that current frontier ASR models still struggle with code-switching, leading to higher error rates.

0 favorites 0 likes

#automatic-speech-recognition

Hearing the Unspoken: Language Model Priors for Acoustic Adversarial Attacks

arXiv cs.LG ↗ · 2026-06-08 Cached

This paper introduces the Semantic Gambit attack, which uses LLM predictions to generate real-time adversarial perturbations for automatic speech recognition systems, achieving a three-fold increase in word error rate over prior methods.

0 favorites 0 likes

#automatic-speech-recognition

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper introduces Agentic ASR, an interactive speech recognition framework that uses semantic correction and reasoning-based editing to reduce semantic errors through multi-turn refinement. It also proposes a new sentence-level semantic error rate metric and an interactive simulation system for benchmarking.

0 favorites 0 likes

#automatic-speech-recognition

SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR

arXiv cs.CL ↗ · 2026-05-21 Cached

SCRIBE is a diagnostic evaluation framework for automatic speech recognition that provides categorical error decomposition for Indic languages, releasing benchmarks and open-weight rich transcription models for Hindi, Malayalam, and Kannada.

0 favorites 0 likes

#automatic-speech-recognition

FormalASR: End-to-End Spoken Chinese to Formal Text

arXiv cs.CL ↗ · 2026-05-20 Cached

FormalASR presents two compact end-to-end models that directly transcribe spoken Chinese into formal written text, achieving significant error reduction and eliminating the need for a separate LLM post-processing stage, enabling lightweight on-device deployment.

0 favorites 0 likes

#automatic-speech-recognition

nvidia/nemotron-3.5-asr-streaming-0.6b

Hugging Face Models Trending ↗ · 2026-05-15 Cached

NVIDIA releases Nemotron 3.5 ASR, a 600M parameter multilingual streaming speech recognition model supporting 40 language-locales with a Cache-Aware FastConformer-RNNT architecture for low-latency transcription. The model supports configurable chunk sizes and is ready for commercial use under the OpenMDW-1.1 license.

0 favorites 0 likes

automatic-speech-recognition

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Hearing the Unspoken: Language Model Priors for Acoustic Adversarial Attacks

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR

FormalASR: End-to-End Spoken Chinese to Formal Text

nvidia/nemotron-3.5-asr-streaming-0.6b

Submit Feedback