speech-generation

#speech-generation

Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]

Reddit r/MachineLearning ↗ · 2d ago

Scenema AI releases Scenema Audio, an open-source diffusion-based model for zero-shot expressive voice cloning and speech generation, separating emotional performance from voice identity to allow any voice to perform any emotion.

0 favorites 0 likes

#speech-generation

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

arXiv cs.CL ↗ · 5d ago Cached

VITA-QinYu is an expressive end-to-end spoken language model capable of role-playing and singing, trained on 15.8K hours of data to outperform peers in expressiveness and conversational accuracy.

0 favorites 0 likes

#speech-generation

ScenemaAI/scenema-audio

Hugging Face Models Trending ↗ · 2026-04-26 Cached

Scenema Audio is a zero-shot expressive voice cloning and speech generation model that produces speech with emotional arcs, pacing, and breath control from text prompts. Built on an audio diffusion transformer, it supports multilingual generation, voice cloning from 10-20 seconds of reference audio, and scene-aware audio with ambient effects.

0 favorites 0 likes

#speech-generation

OpenMOSS-Team/MOSS-TTS-Nano-100M

Hugging Face Models Trending ↗ · 2026-04-02 Cached

MOSS-TTS-Nano is an open-source multilingual speech generation model with only 0.1B parameters, designed for real-time TTS that runs directly on CPU without GPU. Released by OpenMOSS team and MOSI.AI, it enables simple local deployment for web serving and product integration.

0 favorites 0 likes

speech-generation

Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

ScenemaAI/scenema-audio

OpenMOSS-Team/MOSS-TTS-Nano-100M

Submit Feedback