Tag
OpenAI introduced next-generation audio models for the API, including improved speech-to-text (gpt-4o-transcribe, gpt-4o-mini-transcribe) and customizable text-to-speech models that enable developers to build more intelligent and expressive voice agents with enhanced accuracy across challenging scenarios.
OpenAI has launched three real-time audio models in the API, including a real-time translation model GPT Realtime Translate that supports 70 languages and a voice agent GPT Realtime 2 with reasoning capabilities, enabling developers to build more natural voice interaction interfaces.