Qwen3-tts.cpp + Compose Desktop GUI
Summary
The developer improved qwen3-tts.cpp to run 5x realtime on RTX 5080 and created a cross-platform desktop GUI with Kotlin Compose Multiplatform, featuring voice cloning, streaming, and speaker embedding management.
Similar Articles
Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried
Developer shows how to run Qwen3 TTS locally in real-time with streaming, quantization, word-level alignment, and custom voice fine-tuning for an expressive open-source TTS pipeline.
Qwen3-TTS Technical Report
The Qwen3-TTS technical report introduces a series of advanced multilingual text-to-speech models with voice cloning and controllable generation, featuring a dual-track LM architecture and specialized tokenizers for low-latency streaming.
Qwen3.6 35Ba3 has changed my workflows and even how I use my computer
A user describes how Qwen3.6 35B, combined with the 'pi' tool, has transformed their computer workflows, allowing natural language control of the OS and automated task execution. They successfully built a landing page from voice messages entirely locally, demonstrating the model's practical utility.
audio.cpp: 12 audio models (Qwen3-TTS, PocketTTS, VeVo2 etc) in 1 C++/ggml runtime — TTS up to 5x faster than Python on CUDA
audio.cpp is a C++/ggml runtime that integrates 12 audio models including Qwen3-TTS, PocketTTS, and VeVo2, achieving TTS up to 5x faster than Python on CUDA.
Qwen3.7-Plus: Multimodal Agent Intelligence (36 minute read)
Qwen3.7-Plus is a multimodal agent model that unifies vision and language for seamless GUI and CLI interactions, now available via Alibaba Cloud Model Studio.