Tag
Fish Audio S2 is an open-source text-to-speech system featuring multi-speaker capabilities, multi-turn generation, and instruction-following control, backed by a production-ready inference engine with low latency.
VibeVoice is a new model from Microsoft that synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer. It achieves superior fidelity and compression, supporting up to 90 minutes of audio with multiple speakers.