We’re introducing three audio models in the API

YouTube AI Channels Models

model-release audio-models realtime-translation voice-agents api openai realtime

Summary

OpenAI has launched three real-time audio models in the API, including a real-time translation model GPT Realtime Translate that supports 70 languages and a voice agent GPT Realtime 2 with reasoning capabilities, enabling developers to build more natural voice interaction interfaces.

No content available

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/08/26, 06:26 AM

TL;DR: OpenAI has introduced three real-time audio models in the API, including GPT Realtime Translate for real-time translation (supporting 70 languages) and GPT Realtime 2 for voice agents (featuring reasoning and parallel tool calling). ## New Audio Model Overview In the release of new real-time audio models in the OpenAI API, the demo showcases two core capabilities: real-time translation and voice agents. The models process speech in real time without editing, capturing audio directly from a laptop along with transcribed text. ## GPT Realtime Translate: Real-Time Translation The presenter first speaks in French, and the model listens in real time and translates into English. Key features include: - **Real-time following**: The model waits for key words (e.g., verbs) while speaking, then begins translation immediately, creating a natural conversational rhythm. - **Multi-language switching**: In the demo, switching from French to German, the model seamlessly tracks and smoothly transitions between the two languages. - **Technical term handling**: Professional terms such as “GPT real time”, “OpenAI”, and “computer use” are handled effortlessly. - **Support for 70 languages**: The model can translate 70 different languages in real time, adapting to the inflection of each sentence. Use cases include media platforms, customer support, educational tools, etc., aiming to break down language barriers. ## GPT Realtime 2: Intelligent Reasoning for Voice Agents The new model GPT Realtime 2 brings reasoning capabilities to voice agents. The demo uses a personal voice assistant to perform tasks. ### Schedule Query and Parallel Tool Calling The user asks: “I have a customer meeting coming up. Can you check my schedule?” The model replies: “You have a meeting with Sable Crust Robotics in 12 minutes, with their CTO Alex Kim.” The model has reasoning and parallel tool calling capabilities, making it important to use preambles so the model can explain its status and inform the user. ### Maintaining Conversational Continuity and Confirmation Mechanism Executing an operation takes a few seconds. During reasoning and tool calling, the model communicates directly with the user, ensuring the user is always aware of progress. The voice agent stays engaged in the conversation — in the demo, the model keeps listening but does not interrupt until the user says “back to the demo.” ### CRM Update Example The user says: “Hey, can you help me update the CRM? Mark today’s meeting as brief and add next steps.” The model responds: “Let me get the latest context and update your CRM. Sablerest released a warehouse automation solution this morning. Expansion plans are underway, and security review is the bottleneck.” The model then confirms the task is complete. ### Connecting to External Systems The model can connect to any system, including dashboards, services, connected devices, etc. With voice as the primary interface, it can maintain a fluid conversation while thinking in the background. ## Model Capabilities Summary - **Real-time translation**: Supports 70 languages, natural conversational rhythm. - **Reasoning and tool calling**: Communicates with the user during thought processes, executes actions in parallel. - **Context retention**: Continuously listens without interrupting until the user indicates. - **System integration**: Can connect to external products and services. ## Conclusion These new real-time audio models are now available in the OpenAI API, enabling developers to build more natural voice interaction interfaces. OpenAI looks forward to seeing the community create more applications with these models. Source: We’re introducing three audio models in the API – OpenAI (https://www.youtube.com/watch?v=JOu8v6CBjkE)

We’re introducing three audio models in the API

Similar Articles

Advancing voice intelligence with new models in the API

@seclink: OpenAI Launches GPT-Realtime-2, Its Most Intelligent Voice Model to Date. The model features GPT-5-level reasoning, a 128,000 token context window, and supports adjusting 'effort level' for more natural conversation. It can pair with GPT-R…

Introducing next-generation audio models in the API

OpenAI's New Voice Models Want to Do More Than Talk Back

Introducing the Realtime API

Submit Feedback

Similar Articles

Advancing voice intelligence with new models in the API

@seclink: OpenAI Launches GPT-Realtime-2, Its Most Intelligent Voice Model to Date. The model features GPT-5-level reasoning, a 128,000 token context window, and supports adjusting 'effort level' for more natural conversation. It can pair with GPT-R…

Introducing next-generation audio models in the API

OpenAI's New Voice Models Want to Do More Than Talk Back