@garrytan: Everyone's bottleneck in voice AI is the same: retrieval. The agent thinks, network round-trips to a vector DB, and the…

X AI KOLs Following Tools

Summary

Garry Tan highlights that retrieval is the key bottleneck in voice AI and introduces Moss, an open-source tool achieving sub-10ms vector search, alongside a hackathon at YC office on June 6-7.

Everyone's bottleneck in voice AI is the same: retrieval. The agent thinks, network round-trips to a vector DB, and the magic dies. Moss runs search at sub-10ms (no hop). Open source. This is the layer voice agents were missing. Build on it June 6-7 at the YC office.
Original Article
View Cached Full Text

Cached at: 05/31/26, 04:53 PM

Everyone’s bottleneck in voice AI is the same: retrieval. The agent thinks, network round-trips to a vector DB, and the magic dies.

Moss runs search at sub-10ms (no hop). Open source. This is the layer voice agents were missing. Build on it June 6-7 at the YC office.

Pete Koomen (@koomen): Come build agents that can finally hold a fluid conversation at the 24-Hour Conversational AI Hackathon, hosted by @usemoss at the YC Office, June 6-7. First place wins an interview with a YC partner:

Similar Articles

@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...

X AI KOLs Timeline

NTU, NUS, and Shanghai AI Lab jointly released Mega-ASR, a fully open-source ASR model built on Qwen3-ASR. Using the Voices-in-the-Wild-2M dataset and progressive acoustic-to-semantic optimization, it achieves up to 30% relative Word Error Rate (WER) reduction in real-world noisy environments. With only 1.7B parameters, it enables efficient inference on consumer-grade hardware.

How OpenAI delivers low-latency voice AI at scale

OpenAI Blog

OpenAI details its rearchitected WebRTC stack designed to deliver low-latency voice AI at scale for over 900 million users. The post explains how new split-relay and transceiver architectures optimize media routing and connection setup for real-time interactions like ChatGPT voice.

@garrytan: https://x.com/garrytan/status/2053127519872614419

X AI KOLs Timeline

Garry Tan describes using a personal AI agent system, termed 'Book Mirror', to deeply integrate reading material with his life context via Meta-Meta-Prompting. He shares insights on building real AI systems as an operating system rather than just a chat interface.

6 months running a production voice agent for service businesses. The latency math is way harder than the demos suggest.

Reddit r/ArtificialInteligence

After 6 months running a voice AI agent for service businesses, the author reveals that real-world latency is bimodal (median ~800ms, p95 ~2.4s) and this p95 determines user perception. Issues like VAD misfires, function call degradation with long prompts, and TTS quality matter more than LLM choice, with multilingual support adding significant costs.