@ZyphraAI: Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the m…

X AI KOLs Following 06/12/26, 06:18 PM Models

real-time-tts voice-cloning open-source text-to-speech audio-generation apache-2.0

Summary

Zyphra releases ZONOS2, an open-source real-time TTS model with high-fidelity voice cloning, under Apache 2.0, available on Zyphra Cloud on AMD.

Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on @AMD. 🧵 https://t.co/WvI7PXS80M

Original Article

View Cached Full Text

Cached at: 06/15/26, 12:52 AM

Today we’re releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning.

ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on @AMD.

Real-time TTS has always forced a tradeoff between quality and speed.

We achieve both with ZONOS2, the first sparse MoE TTS model released open-source, with 8B total params, 900M active.

ZONOS2 is fast, inference efficient, and super expressive.

ZONOS2 excels at voice cloning, making it the most natural-sounding open-source TTS model out there.

It captures far more of what makes a voice distinctive, so clones sound convincing across a wide range of speakers. Voice cloning is zero-shot, needing no fine-tuning.

ZONOS2 predicts Descript Audio Codec (DAC) tokens for studio-quality 44.1 kHz audio.

DAC tokens maximize quality but are harder to model than low-fi autoencoders. We close that gap with model + data scale, so fidelity doesn’t cost stability.

For the text, we do not use a phonemizer, instead ZONOS2 reads raw UTF-8 bytes. This gives us:

→ broader coverage, especially lower-resource languages → big gains on Chinese, Korean, Japanese → native code-switching mid-sentence

Training data scaled from ~200K hours to 6M+ hours (~707 years of audio).

Staged data filtering ramps transcript-agreement strictness across pretraining → midtraining → annealing. This leads to fewer hallucinations, mispronunciations, and repetitions.

We’re also releasing ZTTS1-Eval, a new TTS benchmark.

Existing evals lean on outdated ASR and read speech. ZTTS1-Eval spans clean + in-the-wild sets across up to 17 languages, modern judges (Qwen3-ASR, ReDimNet, MSR-UTMOS), and prosody metrics.

ZONOS2 is open-weights under Apache 2.0, and free on Zyphra Cloud for a limited time.

Try it on Zyphra Cloud: http://cloud.zyphra.com Blog: http://zyphra.com/our-work/zonos2 Weights: http://huggingface.co/Zyphra/ZONOS2 Inference code: http://github.com/Zyphra/ZONOS2 Eval code: http://github.com/Zyphra/ZTTS1-Eval…

@ZyphraAI is an open superintelligence research and product company based in San Francisco, CA on a mission to build human-aligned AI that helps individuals and organizations reach their fullest potential.

Apply to join us!

@ZyphraAI: Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the m…

Similar Articles

Zyphra/ZONOS2

@Gorden_Sun: ZONOS2: Open-source MoE TTS model. 8B total parameters, 0.9B activated parameters. Supports multilingual, voice cloning, Chinese, and Chinese results are good. Model:

Tested out VoxCPM2 (Open-Source TTS) locally. The "Ultimate Cloning" mode capturing breathing/accents is getting insane.

@Prince_Canuma: mlx-audio v0.4.3 is here A massive release across models, server, and DX → 6 new TTS models: Higgs Audio v2 (voice clon…

@AdinaYakup: dots.tts New TTS from Xiaohongshu (RedNote) 2B - Apache 2.0 Fully continuous architecture (no codec tokens) 48kHz synth…

Submit Feedback

Similar Articles

@Gorden_Sun: ZONOS2: Open-source MoE TTS model. 8B total parameters, 0.9B activated parameters. Supports multilingual, voice cloning, Chinese, and Chinese results are good. Model:

Tested out VoxCPM2 (Open-Source TTS) locally. The "Ultimate Cloning" mode capturing breathing/accents is getting insane.

@Prince_Canuma: mlx-audio v0.4.3 is here A massive release across models, server, and DX → 6 new TTS models: Higgs Audio v2 (voice clon…

@AdinaYakup: dots.tts New TTS from Xiaohongshu (RedNote) 2B - Apache 2.0 Fully continuous architecture (no codec tokens) 48kHz synth…