voice-ai

#voice-ai

Voice AI top blockbuster deals of the month

Reddit r/artificial ↗ · 11h ago Cached

May saw over $1.8 billion in voice AI funding, led by Sierra's $925M and Hark's $700M rounds, while ElevenLabs launched new models for music generation and dubbing with enhanced control. The newsletter also highlights healthcare deals and India's growing voice market.

0 favorites 0 likes

#voice-ai

Real-Time Voice AI Hears but Does Not Listen (arXiv:2606.26083)

Reddit r/artificial ↗ · 21h ago Cached

This paper evaluates four leading real-time voice AI systems (GPT Realtime 2, Gemini 3.1 Flash Live, Qwen3.5 Omni Plus, Omni Flash) and finds they consistently act on words rather than vocal tone, ignoring distress, fear, or sarcasm even when they can perceive them—termed the 'emotional intelligence gap' of voice AI.

0 favorites 0 likes

#voice-ai

@ycombinator: Tune in:

X AI KOLs Following ↗ · yesterday Cached

Koval is a simulation and observability platform for voice agents, helping enterprises scale voice applications safely. Founder Brooke Hopkins shared the potential of voice as a natural interface for AI, as well as the architectural similarities between voice AI and autonomous driving.

0 favorites 0 likes

#voice-ai

@bnicholehopkins: I’m excited to announce @covaldev raised a $28M Series A, led by @NorwestVP with participation from @Base10Partners, @t…

X AI KOLs Following ↗ · yesterday Cached

Coval, a startup focusing on simulation and evaluation for voice AI agents, raises a $28M Series A led by Norwest Venture Partners.

0 favorites 0 likes

#voice-ai

@FeitengLi: Led by Fable 5 (just half a day), Codex relay development took a week. #EdgeSpeak is now live. Friends who shared, contact me to receive an invite code https://edgespeak.com/zh

X AI KOLs Timeline ↗ · 5d ago Cached

EdgeSpeak desktop voice transcription tool is now live, featuring the local Lattice-2 voice model. It supports offline audio/video transcription, multiple languages and accents, and provides a local API for developers to integrate.

1 favorites 0 likes

#voice-ai

Building voice AI agents that take turns like humans — the gotchas nobody warns you about

Reddit r/AI_Agents ↗ · 5d ago

This article shares hard-won lessons from building real-time voice AI agents, highlighting the importance of proper turn-taking, VAD handling, billing awareness, and avoiding echo loops.

0 favorites 0 likes

#voice-ai

@AndrewYNg: New course: Add voice to your AI agents and applications, built with @VocalBridge (disclosure: an AI Fund portfolio com…

X AI KOLs Following ↗ · 2026-06-18 Cached

Andrew Ng announces a new course on adding voice to AI agents using VocalBridge, taught by its CEO. The course covers three integration patterns and evaluation techniques for building reliable and low-latency voice applications.

0 favorites 0 likes

#voice-ai

Vapi vs Elevenlabs comparison cheatsheet

Reddit r/ArtificialInteligence ↗ · 2026-06-18

A comparison cheatsheet between Vapi and Elevenlabs, highlighting their features and differences in voice AI.

0 favorites 0 likes

#voice-ai

How to build a Voice AI that does math and calculates accurate quotes

Reddit r/AI_Agents ↗ · 2026-06-17

A guide on building a Voice AI capable of performing mathematical calculations and generating accurate quotes.

0 favorites 0 likes

#voice-ai

Open to Suggestions: White label AI Voice Agents

Reddit r/AI_Agents ↗ · 2026-06-17

Announcement of white label AI voice agents, enabling businesses to deploy customizable voice AI solutions under their own brand.

0 favorites 0 likes

#voice-ai

Tyto by ai-coustics

Product Hunt ↗ · 2026-06-16

Tyto by ai-coustics is a tool that provides audio insights to predict voice AI performance.

0 favorites 0 likes

#voice-ai

I've been building voice agents for 3 years. Here are the prompting habits that actually make them sound human.

Reddit r/AI_Agents ↗ · 2026-06-15

The article shares key prompting habits for making voice AI agents sound more human, including reading prompts aloud, explicitly using filler words, showing examples instead of telling, handling special characters, and allowing the agent to say it doesn't know.

0 favorites 0 likes

#voice-ai

Infinite Music Glitch on my Arduino with Magenta Realtime 2

Reddit r/LocalLLaMA ↗ · 2026-06-11

A developer built a local voice-controlled music system using an ESP32 microcontroller, a MacBook, Magenta Realtime 2 for real-time music generation, MLX Whisper for transcription, and a Qwen model for tool calling, enabling conversational control over music elements like genre and instruments.

0 favorites 0 likes

#voice-ai

Hush

Product Hunt ↗ · 2026-06-09

Hush is an open-source tool for noise suppression designed for voice AI agents, improving audio clarity in real-time interactions.

0 favorites 0 likes

#voice-ai

@Sumanth_077: Hands on AI Engineering! I open-sourced a collection of 50+ hands-on AI engineering tutorials. It features step-by-step…

X AI KOLs Timeline ↗ · 2026-06-06 Cached

A collection of 50+ hands-on AI engineering tutorials covering AI agents, RAG, MCP, OCR, voice AI, and more, open-sourced with 1k+ GitHub stars.

0 favorites 0 likes

#voice-ai

@svpino: I've built two voice pipelines for two different companies. They both look like this: Audio → STT → Clean transcript → …

X AI KOLs Following ↗ · 2026-06-05 Cached

Santiago highlights the limitation of traditional STT pipelines that lose tone and emotion, then introduces Velma, a voice-native AI model from Modulate that analyzes raw audio to capture intent, emotion, and other acoustic signals, available via API at 10x cheaper than LLM-based approaches.

0 favorites 0 likes

#voice-ai

Latency matters more than model selection when building AI tutoring systems

Reddit r/AI_Agents ↗ · 2026-06-04

A practitioner argues that speech start latency—not model selection—is the critical factor in AI tutoring systems, recommending targets under 1 second for speech start and highlighting streaming TTS as the highest-leverage optimization. The post outlines a full pipeline from ASR through TTS and avatar sync, identifying where latency compounds most.

0 favorites 0 likes

#voice-ai

@ElevenLabsDevs: Call your Hermes Agent

X AI KOLs Following ↗ · 2026-06-04

ElevenLabs introduces the ability to call your Hermes Agent, enabling voice-based interaction with AI agents through their platform.

0 favorites 0 likes

#voice-ai

@uniswap12: Microsoft open-sourced a voice AI that can transcribe 60 minutes of long audio in one go, handling 4 people speaking simultaneously. VibeVoice, open-sourced by Microsoft, 24.8k stars, I only found out about it today. For converting recordings to text, I've been using Whisper, but it often times out on long meeting recordings and struggles with multi-speaker recognition...

X AI KOLs Timeline ↗ · 2026-06-04 Cached

Microsoft open-sourced the VibeVoice speech AI framework, which supports one-shot transcription of 60-minute long audio, multi-speaker diarization and timestamp labeling, and also provides multi-role TTS synthesis capabilities. It is based on Qwen2.5 and comes with a 0.5B lightweight real-time version. It has received 24.8k stars on GitHub.

0 favorites 0 likes

#voice-ai

@svpino: Humans have an average of 200-250 ms of latency when speaking to each other. This voice model is even faster: only 110 …

X AI KOLs Following ↗ · 2026-06-03

An open-weights 8B parameter voice model achieves only 110ms latency, faster than average human conversation latency of 200-250ms. It can be run locally and is freely available via a GitHub repository.

0 favorites 0 likes

voice-ai

Submit Feedback