I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]

Reddit r/MachineLearning 04/22/26, 12:35 PM Tools

Summary

Author highlights under-discussed text normalization issues in streaming TTS and shares a vendor benchmark evaluating 1000+ sentences across 31 categories for dates, URLs, acronyms, etc.

Kinda suprises me how little discussion there is around about mistakes in streaming TTS models People look for natural readers, high voice quality, expressive speech. And most models don't look dumb here and fail. They fail when you give them basic stuff like price, dates, URLs, promo codes, phone numbers. So I was looking for some info and found a benchmark that compares commercial real time streaming TTS models in terms of how they pronounce dates, URLs, acronyms, etc. They are checking 1000+ sentences in 31 categories then use Gemini to see how results came out. [https://async-vocie-ai-text-to-speech-normalization-benchmark.static.hf.space/index.html](https://async-vocie-ai-text-to-speech-normalization-benchmark.static.hf.space/index.html) . Looks valid to me. Obviously this is a vendor benchmark so I am not taking it for granted but the focus feels on point. This has been one of the biggest challenges for us in the production.I am curious how you guys deal with it in practice.

Original Article

Similar Articles

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting)

Reddit r/LocalLLaMA

A revamped TTS benchmark introduces objective standards and live blind voting to create an ELO rating for 46+ models, with participation open to the community.

TTS Benchmark Comparison (all known TTS up until May 2026)

Reddit r/LocalLLaMA

A user-created benchmark for comparing local TTS tools, with results for Windows and Mac, and Linux testing pending. Includes an HTML results page and GitHub repository.

dots.tts Technical Report

Hugging Face Daily Papers

dots.tts presents a 2B-parameter continuous autoregressive TTS model trained on multilingual data, achieving state-of-the-art performance on benchmarks like Seed-TTS-Eval with low-latency streaming via CFG-aware MeanFlow distillation. The model, code, and checkpoints are released under Apache 2.0.

BlasBench: An Open Benchmark for Irish Speech Recognition

arXiv cs.CL

BlasBench introduces an open evaluation benchmark for Irish speech recognition with Irish-aware text normalization that preserves linguistic features like fadas, lenition, and eclipsis. The paper benchmarks 12 ASR systems across four architecture families, revealing significant generalization gaps and showing that existing multilingual systems struggle with Irish due to inadequate normalization.

@HarshalsinghCN: I built an open-source Hinglish TTS that beats every model on the market. I had zero research background. last week I w…

X AI KOLs Timeline

A developer documents building an open-source Hinglish text-to-speech system that outperforms existing models by fixing upstream inference bugs and adding a lightweight preprocessing wrapper, achieving high quality without training or GPU resources.