Tag
dots.tts presents a 2B-parameter continuous autoregressive TTS model trained on multilingual data, achieving state-of-the-art performance on benchmarks like Seed-TTS-Eval with low-latency streaming via CFG-aware MeanFlow distillation. The model, code, and checkpoints are released under Apache 2.0.
ElevenLabs launched Dubbing v2, an AI dubbing model that preserves the original speaker's emotion, tone, and performance across 90+ languages by conditioning on the original audio directly, offering broadcast-quality dubbing at a fraction of the cost.
Swanbench-Speech is a comprehensive benchmark for evaluating long-form speech generation across diverse scenarios, using multi-dimensional metrics covering acoustics, semantics, and expressiveness, revealing limitations of current models.
This paper introduces InterRS, a method for real-time speech generation that interleaves reasoning steps during natural pauses in speech, achieving better performance on math and logic benchmarks while maintaining fluent and instant responses.
Scenema AI releases Scenema Audio, an open-source diffusion-based model for zero-shot expressive voice cloning and speech generation, separating emotional performance from voice identity to allow any voice to perform any emotion.
VITA-QinYu is an expressive end-to-end spoken language model capable of role-playing and singing, trained on 15.8K hours of data to outperform peers in expressiveness and conversational accuracy.
Scenema Audio is a zero-shot expressive voice cloning and speech generation model that produces speech with emotional arcs, pacing, and breath control from text prompts. Built on an audio diffusion transformer, it supports multilingual generation, voice cloning from 10-20 seconds of reference audio, and scene-aware audio with ambient effects.
MOSS-TTS-Nano is an open-source multilingual speech generation model with only 0.1B parameters, designed for real-time TTS that runs directly on CPU without GPU. Released by OpenMOSS team and MOSI.AI, it enables simple local deployment for web serving and product integration.