6 months running a production voice agent for service businesses. The latency math is way harder than the demos suggest.
Summary
After 6 months running a voice AI agent for service businesses, the author reveals that real-world latency is bimodal (median ~800ms, p95 ~2.4s) and this p95 determines user perception. Issues like VAD misfires, function call degradation with long prompts, and TTS quality matter more than LLM choice, with multilingual support adding significant costs.
Similar Articles
Our voice agent's p99 was 280ms. Competitor's was 450ms. Users said ours felt slower. We measured why.
A voice agent team found that despite lower end-to-end latency (280ms vs competitor's 450ms), users perceived it as slower due to poor barge-in interrupt rate (380ms vs 60ms). They identified three fixes—memory pinning, VAD threshold tuning, and smaller TTS chunks—that improved barge-in rate from 41% to 89% at 100ms, making users feel it's faster.
@svpino: Humans have an average of 200-250 ms of latency when speaking to each other. This voice model is even faster: only 110 …
An open-weights 8B parameter voice model achieves only 110ms latency, faster than average human conversation latency of 200-250ms. It can be run locally and is freely available via a GitHub repository.
Latency matters more than model selection when building AI tutoring systems
A practitioner argues that speech start latency—not model selection—is the critical factor in AI tutoring systems, recommending targets under 1 second for speech start and highlighting streaming TTS as the highest-leverage optimization. The post outlines a full pipeline from ASR through TTS and avatar sync, identifying where latency compounds most.
I tested 5 AI voice agent platforms in 2026 on real calls — here’s my honest ranking
A personal ranking of five AI voice agent platforms (LuMay, Vapi, Retell AI, Pipecat, LiveKit Agents) based on production reliability, latency, voice quality, and scalability after 60+ hours of testing.
What’s your current / best AI voice agents stack in 2026?
A community discussion asking what people are using for AI voice agents in production, focusing on latency, interruption handling, and reliability, with mentions of LuMay Voice Agent, Vapi, Retell, and Twilio.