@FeitengLi: 99M 参数的 TTS 跑在 CPU 上，比 2B 大模型跑在 A100 上还快。 Supertone 新开源的 supertonic-3 ONNX Runtime，完全本地，浏览器能跑，手机能跑，树莓派也能跑。

X AI KOLs Timeline 2026/05/15 13:29 模型

tts text-to-speech open-source on-device onnx-runtime cpu-inference lightweight

摘要

Supertone released Supertonic 3, an open-source TTS model with 99M parameters that runs faster on CPU than a 2B model on A100, supporting 31 languages and ONNX Runtime for fully local inference.

99M 参数的 TTS 跑在 CPU 上，比 2B 大模型跑在 A100 上还快。 Supertone 新开源的 supertonic-3 ONNX Runtime，完全本地，浏览器能跑，手机能跑，树莓派也能跑。 https://t.co/brEESjEY0t

查看原文

查看缓存全文

缓存时间: 2026/05/15 23:08

99M 参数的 TTS 跑在 CPU 上，比 2B 大模型跑在 A100 上还快。

Supertone 新开源的 supertonic-3 ONNX Runtime，完全本地，浏览器能跑，手机能跑，树莓派也能跑。

https://t.co/brEESjEY0t

Supertone/supertonic-3 · Hugging Face

Source: https://huggingface.co/Supertone/supertonic-3

https://huggingface.co/Supertone/supertonic-3#supertonic-3–lightning-fast-on-device-accurate-ttsSupertonic 3 | Lightning Fast, On-Device, Accurate TTS

Supertonicis a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis.

Supertonic 3expands the open-weight release from 5 to31 languages, improves reading stability, and reduces repeat/skip failures.

https://huggingface.co/Supertone/supertonic-3#quick-startQuick Start

Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face.

pip install supertonic

from supertonic import TTS

tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")

text = "A gentle breeze moved through the open window while everyone listened to the story."
wav, duration = tts.synthesize(text, voice_style=style, lang="en")

tts.save_audio(wav, "output.wav")
print(f"Generated {duration:.2f}s of audio")

https://huggingface.co/Supertone/supertonic-3#whats-new-in-supertonic-3What’s New in Supertonic 3

31 languages: expanded from the 5-language Supertonic 2 release.
More stable reading: fewer repeat and skip failures, especially on short and long utterances.
Higher speaker similarity: improved similarity across the shared-language set compared with Supertonic 2.
Expression tags: supports simple tags such as<laugh\>,<breath\>, and<sigh\>.

https://huggingface.co/Supertone/supertonic-3#performance-highlightsPerformance Highlights

Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.

https://huggingface.co/Supertone/supertonic-3#reading-accuracyReading Accuracy

Supertonic 3 reading accuracy compared with measured model ranges and VoxCPM2

Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER.

https://huggingface.co/Supertone/supertonic-3#supertonic-2-to-supertonic-3Supertonic 2 to Supertonic 3

Supertonic 2 and Supertonic 3 comparison

Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages.

https://huggingface.co/Supertone/supertonic-3#runtime-footprintRuntime Footprint

Supertonic CPU runtime compared with GPU baselines

Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier.

https://huggingface.co/Supertone/supertonic-3#model-sizeModel Size

Model size comparison

At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference.

https://huggingface.co/Supertone/supertonic-3#supported-languagesSupported Languages

CodeLanguageCodeLanguageCodeLanguageCodeLanguageenEnglishkoKoreanjaJapanesearArabicbgBulgariancsCzechdaDanishdeGermanelGreekesSpanishetEstonianfiFinnishfrFrenchhiHindihrCroatianhuHungarianidIndonesianitItalianltLithuanianlvLatviannlDutchplPolishptPortugueseroRomanianruRussianskSlovakslSloveniansvSwedishtrTurkishukUkrainianviVietnamese

https://huggingface.co/Supertone/supertonic-3#licenseLicense

This project’s sample code is released under the MIT License. See theGitHub repositoryfor details.

The accompanying model is released under the OpenRAIL-M License. See theLICENSEfile in this repository for details.

This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See thePyTorch licensefor details.

相似文章

@GoJun315: 本地跑的开源 TTS，把 ElevenLabs 干掉了。 Supertonic，完全跑在本地的语音合成模型，不联网、零 API 费用。 - 仅 99M 参数，M4 Pro 上比实时快 167 倍，树莓派也能跑 - 支持 31 种语言，覆盖…

X AI KOLs Timeline

Supertonic is a lightning-fast, on-device TTS model with 99M parameters, supporting 31 languages. It runs locally with no API costs, outperforms cloud TTS on accuracy for numbers, phone numbers, and technical terms, and can be installed via Python, Node.js, Rust, Go, and more.

@akshay_pachaar: 这个TTS模型生成语音的速度比人耳听到快167倍。Supertonic 是一款通过ONNX实现跨平台推理的设备端TTS引擎…

X AI KOLs Following

Supertonic 是一款新的开源TTS引擎，通过ONNX在设备上运行，支持31种语言，在速度上超越ElevenLabs，即使在无GPU的树莓派上也能运行。

Supertone/supertonic-3

Hugging Face Models Trending

Supertonic 3 是一个轻量级的开权重文本转语音模型，专为快速设备端推理而设计，支持的语言扩展至 31 种，并提升了稳定性及表情标签支持。

OpenMOSS-Team/MOSS-TTS-Nano-100M