speech-synthesis

#speech-synthesis

@rohanpaul_ai: Thinking Machines is replacing turn-taking AI with always-present AI. They just announced TML-Interaction-Small, a 276B…

X AI KOLs Following ↗ · yesterday

Thinking Machines announced TML-Interaction-Small, a 276B MoE model designed for real-time, always-on interaction with sub-0.4s latency and integrated multimodal processing.

0 favorites 0 likes

#speech-synthesis

Robot mimics human speech

Reddit r/singularity ↗ · 2d ago

The article discusses a robot capable of mimicking human speech, highlighting advancements in robotic voice synthesis and interaction.

0 favorites 0 likes

#speech-synthesis

Hierarchical Codec Diffusion for Video-to-Speech Generation

Hugging Face Daily Papers ↗ · 2026-04-17 Cached

HiCoDiT is a novel Hierarchical Codec Diffusion Transformer for video-to-speech generation that leverages the hierarchical structure of RVQ-based codec discrete speech tokens, using coarse-to-fine conditioning with dual-scale normalization to achieve strong audio-visual alignment.

0 favorites 0 likes

#speech-synthesis

Qwen3.5-Omni Technical Report

Hugging Face Daily Papers ↗ · 2026-04-17 Cached

Qwen3.5-Omni is a hundreds-of-billions-parameter multimodal model with advanced audio-visual understanding and generation capabilities, featuring novel Audio-Visual Vibe Coding and achieving SOTA results across 215 benchmarks while matching Gemini-3.1 Pro.

0 favorites 0 likes

#speech-synthesis

@GoogleDeepMind: More natural sounding speech Support for 70+ languages like Hindi, Japanese, and German SynthID watermarking on all out…

X AI KOLs ↗ · 2026-04-15 Cached

Google DeepMind upgraded its speech synthesis model to sound more natural across 70+ languages and now applies SynthID watermarking to all outputs.

0 favorites 0 likes

#speech-synthesis

Qwen3-TTS Technical Report

Papers with Code Trending ↗ · 2026-01-22 Cached

The Qwen3-TTS technical report introduces a series of advanced multilingual text-to-speech models with voice cloning and controllable generation, featuring a dual-track LM architecture and specialized tokenizers for low-latency streaming.

0 favorites 0 likes

#speech-synthesis

Continuous Audio Language Models

Papers with Code Trending ↗ · 2025-09-08 Cached

This paper introduces Continuous Audio Language Models (CALM), which generate audio using continuous frames instead of discrete tokens to improve fidelity and reduce computational cost in speech and music generation.

0 favorites 0 likes

#speech-synthesis

VibeVoice Technical Report

Papers with Code Trending ↗ · 2025-08-26 Cached

VibeVoice is a new model from Microsoft that synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer. It achieves superior fidelity and compression, supporting up to 90 minutes of audio with multiple speakers.

0 favorites 0 likes

#speech-synthesis

Advanced audio dialog and generation with Gemini 2.5

Google DeepMind Blog ↗ · 2025-06-03 Cached

Google announces Gemini 2.5's advanced native audio capabilities, enabling real-time conversational AI with natural speech generation, style control, and multimodal understanding across 24+ languages.

0 favorites 0 likes

speech-synthesis

Submit Feedback