I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]
Summary
Author highlights under-discussed text normalization issues in streaming TTS and shares a vendor benchmark evaluating 1000+ sentences across 31 categories for dates, URLs, acronyms, etc.
Similar Articles
Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting)
A revamped TTS benchmark introduces objective standards and live blind voting to create an ELO rating for 46+ models, with participation open to the community.
TTS Benchmark Comparison (all known TTS up until May 2026)
A user-created benchmark for comparing local TTS tools, with results for Windows and Mac, and Linux testing pending. Includes an HTML results page and GitHub repository.
dots.tts Technical Report
dots.tts presents a 2B-parameter continuous autoregressive TTS model trained on multilingual data, achieving state-of-the-art performance on benchmarks like Seed-TTS-Eval with low-latency streaming via CFG-aware MeanFlow distillation. The model, code, and checkpoints are released under Apache 2.0.
BlasBench: An Open Benchmark for Irish Speech Recognition
BlasBench introduces an open evaluation benchmark for Irish speech recognition with Irish-aware text normalization that preserves linguistic features like fadas, lenition, and eclipsis. The paper benchmarks 12 ASR systems across four architecture families, revealing significant generalization gaps and showing that existing multilingual systems struggle with Irish due to inadequate normalization.
@HarshalsinghCN: I built an open-source Hinglish TTS that beats every model on the market. I had zero research background. last week I w…
A developer documents building an open-source Hinglish text-to-speech system that outperforms existing models by fixing upstream inference bugs and adding a lightweight preprocessing wrapper, achieving high quality without training or GPU resources.