voice-agents

#voice-agents

@kwindla: OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough …

X AI KOLs Following ↗ · 2d ago

OpenAI has released gpt-realtime-2, a new speech-to-speech model optimized for real-time voice agent interactions with low-latency tool calling.

0 favorites 0 likes

#voice-agents

A New Framework for Evaluating Voice Agents (EVA)

Hugging Face Blog ↗ · 2026-03-24 Cached

ServiceNow introduces EVA, a new end-to-end evaluation framework for conversational voice agents that jointly scores task accuracy and conversational experience.

0 favorites 0 likes

#voice-agents

Improved Gemini audio models for powerful voice experiences

Google DeepMind Blog ↗ · 2025-12-12 Cached

Google has updated Gemini 2.5 Flash Native Audio to improve live voice agent capabilities, including sharper function calling, better instruction following, and smoother conversation context retrieval. The update also introduces live speech translation in the Google Translate app beta, preserving intonation across 70+ languages.

0 favorites 0 likes

#voice-agents

Introducing gpt-realtime and Realtime API updates

OpenAI Blog ↗ · 2025-08-28 Cached

OpenAI is making the Realtime API generally available with a new advanced speech-to-speech model called gpt-realtime, featuring improved instruction following, tool calling, and natural speech quality. New capabilities include MCP server support, image inputs, SIP phone calling, and two new voices (Cedar and Marin).

0 favorites 0 likes

#voice-agents

Introducing next-generation audio models in the API

OpenAI Blog ↗ · 2025-03-20 Cached

OpenAI introduced next-generation audio models for the API, including improved speech-to-text (gpt-4o-transcribe, gpt-4o-mini-transcribe) and customizable text-to-speech models that enable developers to build more intelligent and expressive voice agents with enhanced accuracy across challenging scenarios.

0 favorites 0 likes

#voice-agents

We’re introducing three audio models in the API

YouTube AI Channels ↗ · 2d ago Cached

OpenAI has launched three real-time audio models in the API, including a real-time translation model GPT Realtime Translate that supports 70 languages and a voice agent GPT Realtime 2 with reasoning capabilities, enabling developers to build more natural voice interaction interfaces.

0 favorites 0 likes

voice-agents

@kwindla: OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough …

A New Framework for Evaluating Voice Agents (EVA)

Improved Gemini audio models for powerful voice experiences

Introducing gpt-realtime and Realtime API updates

Introducing next-generation audio models in the API

We’re introducing three audio models in the API

Submit Feedback