Tag
Flowcat addresses the high cost and limited context of realtime voice models, achieving 4x lower cost and 7x more context.
VikParuchuri announces the launch of turbo mode data extraction, claiming 5x faster and cheaper performance with 7% more accuracy than Azure Content Understanding, achieving competitive latency for real-time workflows.
parakeet.cpp enables running NVIDIA Parakeet ASR behind the OpenAI API locally with prebuilt Docker images, supporting CPU and CUDA (including arm64) for real-time transcription with word timestamps.
Simon Willison updates his OpenAI WebRTC Audio Session tool to support the new GPT-Realtime-2 model and adds document context for conversational audio discussions.
NVIDIA released Nemotron 3.5 ASR, an open-source multilingual speech-to-text model with the lowest latency tested, available in multilingual and English-only variants, ideal for voice agents and self-hosted deployments.
The author introduces an experimental project, Hey Codex, a real-time conversational version of Codex that allows users to interact with Codex via voice for Vibe Coding in scenarios like driving.
An AI system maps bird vocalizations into 3D visualizations, converting frequency and modulation data into colored point clusters in real time, with potential applications in industrial and medical anomaly detection.
Author describes building FlashRT, a CUDA-first inference runtime that rewrites model inference paths with C++/CUDA kernels to address bottlenecks beyond GEMM for small-batch/realtime workloads, achieving significant latency improvements on Jetson Thor and RTX 5090. The article discusses lessons on precision (FP8 helpful, FP4 mixed) and the need to bypass generic runtimes for realtime inference.
SolveIt now supports editing messages via conversational voice with optional diff tracking.
Discussion of an upcoming fully realtime interaction model that will be released via API, with plans to create distillation data from it.
OpenAI released the GPT-Realtime-2 voice model, featuring GPT-5-level reasoning capabilities and a 128,000 token context window. It supports real-time translation from over 70 input languages to 13 output languages, achieving 96.6% accuracy on the Big Bench Audio Intelligence benchmark. Greg Brockman called it a milestone in voice translation.
OpenAI has released gpt-realtime-2, a new speech-to-speech model optimized for real-time voice agent interactions with low-latency tool calling.
Codemix open-sources @codemix/graph, a type-safe, CRDT-backed graph database with TypeScript-native schema validation and realtime offline-first sync via Yjs.
Frigate is an open-source NVR designed for Home Assistant that performs real-time AI object detection on IP camera feeds locally using OpenCV and TensorFlow. It features tight Home Assistant integration, motion-based detection, and efficient resource usage.
OpenAI has launched three real-time audio models in the API, including a real-time translation model GPT Realtime Translate that supports 70 languages and a voice agent GPT Realtime 2 with reasoning capabilities, enabling developers to build more natural voice interaction interfaces.