@FeitengLi: 99M 参数的 TTS 跑在 CPU 上,比 2B 大模型跑在 A100 上还快。 Supertone 新开源的 supertonic-3 ONNX Runtime,完全本地,浏览器能跑,手机能跑,树莓派也能跑。
摘要
Supertone released Supertonic 3, an open-source TTS model with 99M parameters that runs faster on CPU than a 2B model on A100, supporting 31 languages and ONNX Runtime for fully local inference.
查看缓存全文
缓存时间: 2026/05/15 23:08
99M 参数的 TTS 跑在 CPU 上,比 2B 大模型跑在 A100 上还快。
Supertone 新开源的 supertonic-3 ONNX Runtime,完全本地,浏览器能跑,手机能跑,树莓派也能跑。
https://t.co/brEESjEY0t
Supertone/supertonic-3 · Hugging Face
Source: https://huggingface.co/Supertone/supertonic-3
https://huggingface.co/Supertone/supertonic-3#supertonic-3–lightning-fast-on-device-accurate-ttsSupertonic 3 | Lightning Fast, On-Device, Accurate TTS
Supertonicis a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis.
Supertonic 3expands the open-weight release from 5 to31 languages, improves reading stability, and reduces repeat/skip failures.
https://huggingface.co/Supertone/supertonic-3#quick-startQuick Start
Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face.
pip install supertonic
from supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")
text = "A gentle breeze moved through the open window while everyone listened to the story."
wav, duration = tts.synthesize(text, voice_style=style, lang="en")
tts.save_audio(wav, "output.wav")
print(f"Generated {duration:.2f}s of audio")
https://huggingface.co/Supertone/supertonic-3#whats-new-in-supertonic-3What’s New in Supertonic 3
- 31 languages: expanded from the 5-language Supertonic 2 release.
- More stable reading: fewer repeat and skip failures, especially on short and long utterances.
- Higher speaker similarity: improved similarity across the shared-language set compared with Supertonic 2.
- Expression tags: supports simple tags such as
<laugh\>,<breath\>, and<sigh\>.
https://huggingface.co/Supertone/supertonic-3#performance-highlightsPerformance Highlights
Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.
https://huggingface.co/Supertone/supertonic-3#reading-accuracyReading Accuracy

Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER.
https://huggingface.co/Supertone/supertonic-3#supertonic-2-to-supertonic-3Supertonic 2 to Supertonic 3

Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages.
https://huggingface.co/Supertone/supertonic-3#runtime-footprintRuntime Footprint

Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier.
https://huggingface.co/Supertone/supertonic-3#model-sizeModel Size

At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference.
https://huggingface.co/Supertone/supertonic-3#supported-languagesSupported Languages
CodeLanguageCodeLanguageCodeLanguageCodeLanguageenEnglishkoKoreanjaJapanesearArabicbgBulgariancsCzechdaDanishdeGermanelGreekesSpanishetEstonianfiFinnishfrFrenchhiHindihrCroatianhuHungarianidIndonesianitItalianltLithuanianlvLatviannlDutchplPolishptPortugueseroRomanianruRussianskSlovakslSloveniansvSwedishtrTurkishukUkrainianviVietnamese
https://huggingface.co/Supertone/supertonic-3#licenseLicense
This project’s sample code is released under the MIT License. See theGitHub repositoryfor details.
The accompanying model is released under the OpenRAIL-M License. See theLICENSEfile in this repository for details.
This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See thePyTorch licensefor details.
Copyright (c) 2026 Supertone Inc.
相似文章
@GoJun315: 本地跑的开源 TTS,把 ElevenLabs 干掉了。 Supertonic,完全跑在本地的语音合成模型,不联网、零 API 费用。 - 仅 99M 参数,M4 Pro 上比实时快 167 倍,树莓派也能跑 - 支持 31 种语言,覆盖…
Supertonic is a lightning-fast, on-device TTS model with 99M parameters, supporting 31 languages. It runs locally with no API costs, outperforms cloud TTS on accuracy for numbers, phone numbers, and technical terms, and can be installed via Python, Node.js, Rust, Go, and more.
@akshay_pachaar: 这个TTS模型生成语音的速度比人耳听到快167倍。Supertonic 是一款通过ONNX实现跨平台推理的设备端TTS引擎…
Supertonic 是一款新的开源TTS引擎,通过ONNX在设备上运行,支持31种语言,在速度上超越ElevenLabs,即使在无GPU的树莓派上也能运行。
Supertone/supertonic-3
Supertonic 3 是一个轻量级的开权重文本转语音模型,专为快速设备端推理而设计,支持的语言扩展至 31 种,并提升了稳定性及表情标签支持。
OpenMOSS-Team/MOSS-TTS-Nano-100M
MOSS-TTS-Nano是一个开源的多语言语音生成模型,仅0.1B参数,专为实时TTS设计,可直接在CPU上运行而无需GPU。由OpenMOSS团队和MOSI.AI发布,它支持简单的本地部署,用于Web服务和产品集成。
@Honcia13: 开源TTS直接卷疯了!园区诈骗又有新武器? 清华 OpenBMB 刚刚放出 VoxCPM2: 200亿参数 + 200万小时多语言数据训练,48kHz录音棚级音质! 最狠的是——完全不用Tokenizer,直接在连续潜空间做扩散自回归,细…
清华大学 OpenBMB 发布了 VoxCPM2,这是一个拥有 200 亿参数的开源多语言 TTS 模型,支持无需 Tokenizer 的连续潜空间扩散自回归生成,具备 48kHz 录音棚级音质和强大的声音克隆与设计能力。
