Tag
This paper investigates the impact of data scale versus latency on cross-lingual transfer for streaming ASR, finding that multilingual initialization benefits are data-limited, not latency-limited, and diminish as target-language data increases.
Proposes a non-autoregressive scoring method for punctuation restoration in streaming ASR that preserves the input transcript and outperforms prompt-based and fine-tuned baselines under a limited lookahead budget.
A routing-based approach for real-time multilingual ASR that uses smaller monolingual models with a rollback mechanism to handle language switches, achieving ~13% WER on inter-utterance code-switching and open-sourcing the system.