latency

#latency

Best STT API for voice agents? I’d test latency before accuracy

Reddit r/AI_Agents ↗ · 7h ago

The author argues that for live voice agents, STT latency and real-time behavior are more critical than raw transcription accuracy, and proposes a different evaluation scorecard.

0 favorites 0 likes

#latency

p99 0ms* autocomplete for 240 million domain names

Lobsters Hottest ↗ · 3d ago Cached

The article explains how the author achieved p99 zero-millisecond perceived latency for autocomplete on 240 million domain names by prefetching suggestions on keyDown and caching, with a fast API built on Tranco and CZDS data.

0 favorites 0 likes

#latency

Meet Alice. Alice is impatient

Lobsters Hottest ↗ · 5d ago Cached

This blog post explains the inspection paradox in system latency and recovery time measurement, showing why customers experience longer average waits than service metrics suggest. It includes an interactive simulation and emphasizes the importance of understanding the tail of the distribution.

0 favorites 0 likes

#latency

@liquidai: Storing too many tools in your context window increases latency and can lead to wrong tool selection. In this demo, we …

X AI KOLs Following ↗ · 5d ago Cached

Liquid AI demonstrates using LFM2.5-ColBERT-350M as a filter to select only the five most relevant tools from 151 options, reducing latency and improving tool selection accuracy.

0 favorites 0 likes

#latency

Surprising economics of load-balanced systems

Hacker News Top ↗ · 5d ago Cached

A blog post analyzes the M/M/c queueing model and shows that increasing the number of servers in a load-balanced system improves latency at constant per-server load, a beneficial and somewhat counterintuitive result for cloud economics.

0 favorites 0 likes

#latency

@kazukifujii: Sakura Internet's Michishita-san's article comprehensively summarizes LLM Inference and comes highly recommended. It fe…

X AI KOLs Timeline ↗ · 2026-06-18 Cached

This article summarizes a presentation by Junda Chen on disaggregated inference for LLMs, explaining why goodput (throughput meeting latency SLOs) matters more than raw throughput, and how separating prefill and decode phases improves performance. It also highlights the influence on NVIDIA Dynamo.

0 favorites 0 likes

#latency

Your voice agent probably isn't slow because of the LLM.

Reddit r/AI_Agents ↗ · 2026-06-17

A developer debunks the common belief that LLM latency is the primary cause of slow voice agents, explaining that delays often stem from earlier stages like audio capture, VAD, and STT. They recommend logging specific latency metrics and testing various STT/TTS providers and orchestration frameworks to diagnose issues.

0 favorites 0 likes

#latency

A Practical Evaluation Method for Long-Form Simultaneous Speech-to-Speech Translation

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper proposes a practical evaluation method for long-form simultaneous speech-to-speech translation that uses ASR, forced alignment, and sentence embedding alignment to compute latency and quality metrics on continuous speech, overcoming limitations of prior approaches.

0 favorites 0 likes

#latency

A Guide to AI Inference Engineering (17 minute read)

TLDR AI ↗ · 2026-06-16 Cached

This guide explains the discipline of AI inference engineering, covering the split between prefill and decoding phases, the shift from closed to open models, and optimization techniques for latency, throughput, and cost.

0 favorites 0 likes

#latency

@modal: https://x.com/modal/status/2066636221921521892

X AI KOLs Following ↗ · 2026-06-15 Cached

Modal announced several major product updates including VM Sandboxes with real Linux kernel support, lower-latency regional routing, domain allowlisting for Sandboxes, RBAC, named images, and SDK updates.

0 favorites 0 likes

#latency

@sdianahu: 1/ fast AI inference is about to replay the history lesson from search engines on why low latency is so important

X AI KOLs Following ↗ · 2026-06-14 Cached

Dian Hu draws a parallel between the importance of low latency in search engines and the upcoming need for fast AI inference.

0 favorites 0 likes

#latency

How can I schedule work on a thread pool with low latency?

The Old New Thing (Raymond Chen) ↗ · 2026-06-12 Cached

This article from The Old New Thing explains that Windows thread pools are optimized for throughput, not latency, and provides solutions for low-latency scheduling, such as creating a custom thread pool or using a dedicated worker thread, with code examples in C++ and C#.

0 favorites 0 likes

#latency

@barrowjoseph: https://x.com/barrowjoseph/status/2065423284343050314

X AI KOLs Timeline ↗ · 2026-06-12 Cached

A blog post revisits the concept of 'Slow Search' in the context of agentic retrieval, arguing that per-query latency can be traded for better retrieval quality to reduce overall task time and cost for AI agents.

0 favorites 0 likes

#latency

Why hasn't any mainstream game integrated LLMs into NPCs yet?

Reddit r/LocalLLaMA ↗ · 2026-06-12

Explores why mainstream games have not yet integrated large language models into NPCs, questioning whether latency issues or lack of interest from game studios are the primary obstacles.

0 favorites 0 likes

#latency

What’s the Biggest Problem With AI Voice Agents Right Now?

Reddit r/AI_Agents ↗ · 2026-06-12

Discusses key challenges facing AI voice agents in real-world customer interactions, such as accent handling, latency, and integration, and invites experiences from businesses.

0 favorites 0 likes

#latency

F1 teams spend millions on their simulators—what makes them different?

Ars Technica ↗ · 2026-06-11 Cached

Formula 1 teams invest millions in driver-in-the-loop simulators with ultra-low latency and high fidelity to replicate real car behavior, enabling drivers to train and develop setups.

0 favorites 0 likes

#latency

@Modular: .@hippocraticai runs 400B+ parameter models for real-time patient conversations, tens of thousands per day. When they b…

X AI KOLs Following ↗ · 2026-06-11 Cached

Hippocratic AI partners with Modular to use MAX framework for inference on large language models, achieving sub-500ms TTFT, ~30% faster P99 latency and ~22% faster mean latency at scale on NVIDIA B300 GPUs, with portability to AMD.

0 favorites 0 likes

#latency

@GergelyOrosz: The postmortem from Coinbase's 10-hour outage is out and... damn They run global trading from a single region because o…

X AI KOLs Following ↗ · 2026-06-11 Cached

Coinbase's 10-hour outage postmortem reveals they run global trading from a single region without automated failover, raising concerns about their infrastructure reliability.

0 favorites 0 likes

#latency

The Price of Anarchy in Disaggregated Inference

Hugging Face Daily Papers ↗ · 2026-06-11 Cached

This paper presents a game-theoretic analysis of disaggregated inference architectures that separate prefill and decode phases across GPU pools, characterizing how GPU saturation affects performance. The authors propose an adaptive controller that detects saturation transitions and adjusts routing parameters, reducing the Price of Anarchy significantly in experiments on NVIDIA B200 clusters.

0 favorites 0 likes

#latency

Linux latency measurements and compositor tuning

Lobsters Hottest ↗ · 2026-06-10 Cached

A detailed investigation of Linux latency in gaming using a Teensy-based LDAT tool, measuring click-to-photon latency with various settings on Nvidia GPUs under KDE Wayland, comparing to Windows.

0 favorites 0 likes

latency

Submit Feedback