next-token-diffusion

#next-token-diffusion

VibeVoice Technical Report

Papers with Code Trending ↗ · 2025-08-26 Cached

VibeVoice is a new model from Microsoft that synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer. It achieves superior fidelity and compression, supporting up to 90 minutes of audio with multiple speakers.

0 favorites 0 likes

next-token-diffusion

VibeVoice Technical Report

Submit Feedback