@FeitengLi: 99M 参数的 TTS 跑在 CPU 上,比 2B 大模型跑在 A100 上还快。 Supertone 新开源的 supertonic-3 ONNX Runtime,完全本地,浏览器能跑,手机能跑,树莓派也能跑。

X AI KOLs Timeline 模型

摘要

Supertone released Supertonic 3, an open-source TTS model with 99M parameters that runs faster on CPU than a 2B model on A100, supporting 31 languages and ONNX Runtime for fully local inference.

99M 参数的 TTS 跑在 CPU 上,比 2B 大模型跑在 A100 上还快。 Supertone 新开源的 supertonic-3 ONNX Runtime,完全本地,浏览器能跑,手机能跑,树莓派也能跑。 https://t.co/brEESjEY0t
查看原文
查看缓存全文

缓存时间: 2026/05/15 23:08

99M 参数的 TTS 跑在 CPU 上,比 2B 大模型跑在 A100 上还快。

Supertone 新开源的 supertonic-3 ONNX Runtime,完全本地,浏览器能跑,手机能跑,树莓派也能跑。

https://t.co/brEESjEY0t


Supertone/supertonic-3 · Hugging Face

Source: https://huggingface.co/Supertone/supertonic-3

https://huggingface.co/Supertone/supertonic-3#supertonic-3–lightning-fast-on-device-accurate-ttsSupertonic 3 | Lightning Fast, On-Device, Accurate TTS

Supertonic 3 Preview

DemoCodePython SDK

Supertonicis a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis.

Supertonic 3expands the open-weight release from 5 to31 languages, improves reading stability, and reduces repeat/skip failures.

https://huggingface.co/Supertone/supertonic-3#quick-startQuick Start

Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face.

pip install supertonic
from supertonic import TTS

tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")

text = "A gentle breeze moved through the open window while everyone listened to the story."
wav, duration = tts.synthesize(text, voice_style=style, lang="en")

tts.save_audio(wav, "output.wav")
print(f"Generated {duration:.2f}s of audio")

https://huggingface.co/Supertone/supertonic-3#whats-new-in-supertonic-3What’s New in Supertonic 3

  • 31 languages: expanded from the 5-language Supertonic 2 release.
  • More stable reading: fewer repeat and skip failures, especially on short and long utterances.
  • Higher speaker similarity: improved similarity across the shared-language set compared with Supertonic 2.
  • Expression tags: supports simple tags such as<laugh\>,<breath\>, and<sigh\>.

https://huggingface.co/Supertone/supertonic-3#performance-highlightsPerformance Highlights

Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.

https://huggingface.co/Supertone/supertonic-3#reading-accuracyReading Accuracy

Supertonic 3 reading accuracy compared with measured model ranges and VoxCPM2

Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER.

https://huggingface.co/Supertone/supertonic-3#supertonic-2-to-supertonic-3Supertonic 2 to Supertonic 3

Supertonic 2 and Supertonic 3 comparison

Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages.

https://huggingface.co/Supertone/supertonic-3#runtime-footprintRuntime Footprint

Supertonic CPU runtime compared with GPU baselines

Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier.

https://huggingface.co/Supertone/supertonic-3#model-sizeModel Size

Model size comparison

At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference.

https://huggingface.co/Supertone/supertonic-3#supported-languagesSupported Languages

CodeLanguageCodeLanguageCodeLanguageCodeLanguageenEnglishkoKoreanjaJapanesearArabicbgBulgariancsCzechdaDanishdeGermanelGreekesSpanishetEstonianfiFinnishfrFrenchhiHindihrCroatianhuHungarianidIndonesianitItalianltLithuanianlvLatviannlDutchplPolishptPortugueseroRomanianruRussianskSlovakslSloveniansvSwedishtrTurkishukUkrainianviVietnamese

https://huggingface.co/Supertone/supertonic-3#licenseLicense

This project’s sample code is released under the MIT License. See theGitHub repositoryfor details.

The accompanying model is released under the OpenRAIL-M License. See theLICENSEfile in this repository for details.

This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See thePyTorch licensefor details.

Copyright (c) 2026 Supertone Inc.

相似文章

@GoJun315: 本地跑的开源 TTS,把 ElevenLabs 干掉了。 Supertonic,完全跑在本地的语音合成模型,不联网、零 API 费用。 - 仅 99M 参数,M4 Pro 上比实时快 167 倍,树莓派也能跑 - 支持 31 种语言,覆盖…

X AI KOLs Timeline

Supertonic is a lightning-fast, on-device TTS model with 99M parameters, supporting 31 languages. It runs locally with no API costs, outperforms cloud TTS on accuracy for numbers, phone numbers, and technical terms, and can be installed via Python, Node.js, Rust, Go, and more.

Supertone/supertonic-3

Hugging Face Models Trending

Supertonic 3 是一个轻量级的开权重文本转语音模型,专为快速设备端推理而设计,支持的语言扩展至 31 种,并提升了稳定性及表情标签支持。

OpenMOSS-Team/MOSS-TTS-Nano-100M

Hugging Face Models Trending

MOSS-TTS-Nano是一个开源的多语言语音生成模型,仅0.1B参数,专为实时TTS设计,可直接在CPU上运行而无需GPU。由OpenMOSS团队和MOSI.AI发布,它支持简单的本地部署,用于Web服务和产品集成。

@Honcia13: 开源TTS直接卷疯了!园区诈骗又有新武器? 清华 OpenBMB 刚刚放出 VoxCPM2: 200亿参数 + 200万小时多语言数据训练,48kHz录音棚级音质! 最狠的是——完全不用Tokenizer,直接在连续潜空间做扩散自回归,细…

X AI KOLs Timeline

清华大学 OpenBMB 发布了 VoxCPM2,这是一个拥有 200 亿参数的开源多语言 TTS 模型,支持无需 Tokenizer 的连续潜空间扩散自回归生成,具备 48kHz 录音棚级音质和强大的声音克隆与设计能力。