@garrytan: Everyone's bottleneck in voice AI is the same: retrieval. The agent thinks, network round-trips to a vector DB, and the…

X AI KOLs Following 05/29/26, 01:34 PM Tools

voice-ai retrieval vector-db open-source real-time hackathon moss

Summary

Garry Tan highlights that retrieval is the key bottleneck in voice AI and introduces Moss, an open-source tool achieving sub-10ms vector search, alongside a hackathon at YC office on June 6-7.

Everyone's bottleneck in voice AI is the same: retrieval. The agent thinks, network round-trips to a vector DB, and the magic dies. Moss runs search at sub-10ms (no hop). Open source. This is the layer voice agents were missing. Build on it June 6-7 at the YC office.

Original Article

View Cached Full Text

Cached at: 05/31/26, 04:53 PM

Everyone’s bottleneck in voice AI is the same: retrieval. The agent thinks, network round-trips to a vector DB, and the magic dies.

Moss runs search at sub-10ms (no hop). Open source. This is the layer voice agents were missing. Build on it June 6-7 at the YC office.

Pete Koomen (@koomen): Come build agents that can finally hold a fluid conversation at the 24-Hour Conversational AI Hackathon, hosted by @usemoss at the YC Office, June 6-7. First place wins an interview with a YC partner:

Similar Articles

@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...

X AI KOLs Timeline

NTU, NUS, and Shanghai AI Lab jointly released Mega-ASR, a fully open-source ASR model built on Qwen3-ASR. Using the Voices-in-the-Wild-2M dataset and progressive acoustic-to-semantic optimization, it achieves up to 30% relative Word Error Rate (WER) reduction in real-world noisy environments. With only 1.7B parameters, it enables efficient inference on consumer-grade hardware.

How OpenAI delivers low-latency voice AI at scale

OpenAI Blog

OpenAI details its rearchitected WebRTC stack designed to deliver low-latency voice AI at scale for over 900 million users. The post explains how new split-relay and transceiver architectures optimize media routing and connection setup for real-time interactions like ChatGPT voice.

@garrytan: GBrain is SOTA retrieval for agents and free for your use. MIT license. It currently is designed for OpenClaw and Herme…

X AI KOLs Following

GBrain is a state-of-the-art retrieval tool for AI agents, released under MIT license, featuring hybrid search, self-wiring knowledge graphs, and temporal question answering, built by Y Combinator's CEO for his own agents.

@garrytan: https://x.com/garrytan/status/2053127519872614419

X AI KOLs Timeline

Garry Tan describes using a personal AI agent system, termed 'Book Mirror', to deeply integrate reading material with his life context via Meta-Meta-Prompting. He shares insights on building real AI systems as an operating system rather than just a chat interface.

6 months running a production voice agent for service businesses. The latency math is way harder than the demos suggest.

Reddit r/ArtificialInteligence

After 6 months running a voice AI agent for service businesses, the author reveals that real-world latency is bimodal (median ~800ms, p95 ~2.4s) and this p95 determines user perception. Issues like VAD misfires, function call degradation with long prompts, and TTS quality matter more than LLM choice, with multilingual support adding significant costs.

Similar Articles

How OpenAI delivers low-latency voice AI at scale

@garrytan: GBrain is SOTA retrieval for agents and free for your use. MIT license. It currently is designed for OpenClaw and Herme…

@garrytan: https://x.com/garrytan/status/2053127519872614419

6 months running a production voice agent for service businesses. The latency math is way harder than the demos suggest.

Submit Feedback