Tag
mlx-audio v0.4.3 releases with 6 new TTS models including Higgs Audio v2 and OmniVoice (646+ languages), plus server improvements like concurrent requests and continuous batching, ~3x faster Voxtral Realtime on 4-bit, and slimmer dependencies for Apple Silicon.
A developer built a JARVIS-style personal assistant called CYBER with wake word activation, local voice cloning via XTTS v2, vision mode, and LLM-generated system commands, all running locally without cloud dependencies.
Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.
A personal reflection on the rapid evolution of AI over the past three years, from early ChatGPT and GPT-4 quotas to BabyAGI, DALL·E, and voice cloning.
OmniVoice is a massively multilingual zero-shot text-to-speech model supporting over 600 languages, built on a diffusion language model architecture with fast inference and voice cloning capabilities.