@FeitengLi: A 99M parameter TTS runs on CPU, faster than a 2B model on A100. Supertone's newly open-sourced supertonic-3 with ONNX Runtime, fully local, can run in browser, on phone, and even on Raspberry Pi.

X AI KOLs Timeline 05/15/26, 01:29 PM Models

tts text-to-speech open-source on-device onnx-runtime cpu-inference lightweight

Summary

Supertone released Supertonic 3, an open-source TTS model with 99M parameters that runs faster on CPU than a 2B model on A100, supporting 31 languages and ONNX Runtime for fully local inference.

A 99M parameter TTS runs on CPU, faster than a 2B model on A100. Supertone's newly open-sourced supertonic-3 ONNX Runtime, fully local, can run in browser, on phone, and even on Raspberry Pi. https://t.co/brEESjEY0t

Original Article

View Cached Full Text

Cached at: 05/15/26, 11:08 PM

A 99M-parameter TTS runs on CPU, faster than a 2B large model running on A100. Supertone’s newly open-sourced supertonic-3 ONNX Runtime is fully local, runs in a browser, on a phone, and even on a Raspberry Pi. https://t.co/brEESjEY0t — # Supertone/supertonic-3 · Hugging Face Source: https://huggingface.co/Supertone/supertonic-3 ## https://huggingface.co/Supertone/supertonic-3#supertonic-3–lightning-fast-on-device-accurate-ttsSupertonic 3 | Lightning Fast, On-Device, Accurate TTS Supertonic 3 Preview (https://huggingface.co/Supertone/supertonic-3/blob/main/img/Supertonic3_HeroImage.png) Demo (https://huggingface.co/spaces/Supertone/supertonic-3)Code (https://github.com/supertone-inc/supertonic)Python SDK (https://pypi.org/project/supertonic/) Supertonic is a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis. Supertonic 3 expands the open-weight release from 5 to 31 languages, improves reading stability, and reduces repeat/skip failures. ## https://huggingface.co/Supertone/supertonic-3#quick-startQuick Start Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face. pip install supertonic from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "A gentle breeze moved through the open window while everyone listened to the story." wav, duration = tts.synthesize(text, voice_style=style, lang="en") tts.save_audio(wav, "output.wav") print(f"Generated {duration:.2f}s of audio") ## https://huggingface.co/Supertone/supertonic-3#whats-new-in-supertonic-3What’s New in Supertonic 3 - 31 languages: expanded from the 5-language Supertonic 2 release. - More stable reading: fewer repeat and skip failures, especially on short and long utterances. - Higher speaker similarity: improved similarity across the shared-language set compared with Supertonic 2. - Expression tags: supports simple tags such as , , and ``. ## https://huggingface.co/Supertone/supertonic-3#performance-highlightsPerformance Highlights Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems. ### https://huggingface.co/Supertone/supertonic-3#reading-accuracyReading Accuracy Supertonic 3 reading accuracy compared with measured model ranges and VoxCPM2 Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER. ### https://huggingface.co/Supertone/supertonic-3#supertonic-2-to-supertonic-3Supertonic 2 to Supertonic 3 Supertonic 2 and Supertonic 3 comparison Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. ### https://huggingface.co/Supertone/supertonic-3#runtime-footprintRuntime Footprint Supertonic CPU runtime compared with GPU baselines Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier. ### https://huggingface.co/Supertone/supertonic-3#model-sizeModel Size Model size comparison At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. ## https://huggingface.co/Supertone/supertonic-3#supported-languagesSupported Languages CodeLanguageCodeLanguageCodeLanguageCodeLanguageenEnglishkoKoreanjaJapanesearArabicbgBulgariancsCzechdaDanishdeGermanelGreekesSpanishetEstonianfiFinnishfrFrenchhiHindihrCroatianhuHungarianidIndonesianitItalianltLithuanianlvLatviannlDutchplPolishptPortugueseroRomanianruRussianskSlovakslSloveniansvSwedishtrTurkishukUkrainianviVietnamese ## https://huggingface.co/Supertone/supertonic-3#licenseLicense This project’s sample code is released under the MIT License. See theGitHub repository (https://github.com/supertone-inc/supertonic)for details. The accompanying model is released under the OpenRAIL-M License. See theLICENSE (https://huggingface.co/Supertone/supertonic-3/blob/main/LICENSE)file in this repository for details. This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See thePyTorch license (https://docs.pytorch.org/FBGEMM/general/License.html)for details. Copyright (c) 2026 Supertone Inc.

@FeitengLi: A 99M parameter TTS runs on CPU, faster than a 2B model on A100. Supertone's newly open-sourced supertonic-3 with ONNX Runtime, fully local, can run in browser, on phone, and even on Raspberry Pi.

Similar Articles

@GoJun315: Open-source TTS that runs locally and beats ElevenLabs. Supertonic, a speech synthesis model that runs entirely on-device, no internet required, zero API costs. - Only 99M parameters, 167x faster than real-time on M4 Pro, runs on Raspberry Pi - Supports 31 languages, covering…

@AlphaSignalAI: A 66M parameter model just beat ElevenLabs on a Raspberry Pi. Text-to-speech has lived in the cloud for years. Every sp…

@akshay_pachaar: this TTS model generates speech 167x faster than you can hear it. Supertonic is an on-device TTS engine that runs via O…

Supertone/supertonic-3

Submit Feedback

Similar Articles

@GoJun315: Open-source TTS that runs locally and beats ElevenLabs. Supertonic, a speech synthesis model that runs entirely on-device, no internet required, zero API costs. - Only 99M parameters, 167x faster than real-time on M4 Pro, runs on Raspberry Pi - Supports 31 languages, covering…

@AlphaSignalAI: A 66M parameter model just beat ElevenLabs on a Raspberry Pi. Text-to-speech has lived in the cloud for years. Every sp…

@akshay_pachaar: this TTS model generates speech 167x faster than you can hear it. Supertonic is an on-device TTS engine that runs via O…

@FeitengLi: Hy-MT2 - a new open-source multilingual translation model that matches top-tier large models in capability, supports translation between 33 languages, and offers flexible instruction capabilities. It achieves 2-bit quantization under 500MB, making it well-suited for on-device deployment. https://modelsc…