@FeitengLi: A 99M parameter TTS runs on CPU, faster than a 2B model on A100. Supertone's newly open-sourced supertonic-3 with ONNX Runtime, fully local, can run in browser, on phone, and even on Raspberry Pi.

X AI KOLs Timeline Models

Summary

Supertone released Supertonic 3, an open-source TTS model with 99M parameters that runs faster on CPU than a 2B model on A100, supporting 31 languages and ONNX Runtime for fully local inference.

A 99M parameter TTS runs on CPU, faster than a 2B model on A100. Supertone's newly open-sourced supertonic-3 ONNX Runtime, fully local, can run in browser, on phone, and even on Raspberry Pi. https://t.co/brEESjEY0t
Original Article
View Cached Full Text

Cached at: 05/15/26, 11:08 PM

A 99M-parameter TTS runs on CPU, faster than a 2B large model running on A100. Supertone’s newly open-sourced supertonic-3 ONNX Runtime is fully local, runs in a browser, on a phone, and even on a Raspberry Pi. https://t.co/brEESjEY0t — # Supertone/supertonic-3 · Hugging Face Source: https://huggingface.co/Supertone/supertonic-3 ## https://huggingface.co/Supertone/supertonic-3#supertonic-3–lightning-fast-on-device-accurate-ttsSupertonic 3 | Lightning Fast, On-Device, Accurate TTS Supertonic 3 Preview (https://huggingface.co/Supertone/supertonic-3/blob/main/img/Supertonic3_HeroImage.png) Demo (https://huggingface.co/spaces/Supertone/supertonic-3)Code (https://github.com/supertone-inc/supertonic)Python SDK (https://pypi.org/project/supertonic/) Supertonic is a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis. Supertonic 3 expands the open-weight release from 5 to 31 languages, improves reading stability, and reduces repeat/skip failures. ## https://huggingface.co/Supertone/supertonic-3#quick-startQuick Start Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face. pip install supertonic from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "A gentle breeze moved through the open window while everyone listened to the story." wav, duration = tts.synthesize(text, voice_style=style, lang="en") tts.save_audio(wav, "output.wav") print(f"Generated {duration:.2f}s of audio") ## https://huggingface.co/Supertone/supertonic-3#whats-new-in-supertonic-3What’s New in Supertonic 3 - 31 languages: expanded from the 5-language Supertonic 2 release. - More stable reading: fewer repeat and skip failures, especially on short and long utterances. - Higher speaker similarity: improved similarity across the shared-language set compared with Supertonic 2. - Expression tags: supports simple tags such as , , and ``. ## https://huggingface.co/Supertone/supertonic-3#performance-highlightsPerformance Highlights Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems. ### https://huggingface.co/Supertone/supertonic-3#reading-accuracyReading Accuracy Supertonic 3 reading accuracy compared with measured model ranges and VoxCPM2 Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER. ### https://huggingface.co/Supertone/supertonic-3#supertonic-2-to-supertonic-3Supertonic 2 to Supertonic 3 Supertonic 2 and Supertonic 3 comparison Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. ### https://huggingface.co/Supertone/supertonic-3#runtime-footprintRuntime Footprint Supertonic CPU runtime compared with GPU baselines Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier. ### https://huggingface.co/Supertone/supertonic-3#model-sizeModel Size Model size comparison At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. ## https://huggingface.co/Supertone/supertonic-3#supported-languagesSupported Languages CodeLanguageCodeLanguageCodeLanguageCodeLanguageenEnglishkoKoreanjaJapanesearArabicbgBulgariancsCzechdaDanishdeGermanelGreekesSpanishetEstonianfiFinnishfrFrenchhiHindihrCroatianhuHungarianidIndonesianitItalianltLithuanianlvLatviannlDutchplPolishptPortugueseroRomanianruRussianskSlovakslSloveniansvSwedishtrTurkishukUkrainianviVietnamese ## https://huggingface.co/Supertone/supertonic-3#licenseLicense This project’s sample code is released under the MIT License. See theGitHub repository (https://github.com/supertone-inc/supertonic)for details. The accompanying model is released under the OpenRAIL-M License. See theLICENSE (https://huggingface.co/Supertone/supertonic-3/blob/main/LICENSE)file in this repository for details. This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See thePyTorch license (https://docs.pytorch.org/FBGEMM/general/License.html)for details. Copyright (c) 2026 Supertone Inc.

Similar Articles

@GoJun315: Open-source TTS that runs locally and beats ElevenLabs. Supertonic, a speech synthesis model that runs entirely on-device, no internet required, zero API costs. - Only 99M parameters, 167x faster than real-time on M4 Pro, runs on Raspberry Pi - Supports 31 languages, covering…

X AI KOLs Timeline

Supertonic is a lightning-fast, on-device TTS model with 99M parameters, supporting 31 languages. It runs locally with no API costs, outperforms cloud TTS on accuracy for numbers, phone numbers, and technical terms, and can be installed via Python, Node.js, Rust, Go, and more.

Supertone/supertonic-3

Hugging Face Models Trending

Supertonic 3 is a lightweight, open-weight text-to-speech model designed for fast on-device inference, expanding support to 31 languages with improved stability and expression tags.

OpenMOSS-Team/MOSS-TTS-Nano-100M

Hugging Face Models Trending

MOSS-TTS-Nano is an open-source multilingual speech generation model with only 0.1B parameters, designed for real-time TTS that runs directly on CPU without GPU. Released by OpenMOSS team and MOSI.AI, it enables simple local deployment for web serving and product integration.

@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!

X AI KOLs Timeline

Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.