OpenAI prepares major ChatGPT voice upgrade with GPT-Bidi-1 (2 minute read)

TLDR AI Models

Summary

OpenAI is preparing to release GPT-Bidi-1, a next-generation voice model for ChatGPT that supports bidirectional communication, interruptions, and mid-sentence adjustments, aiming to close the gap between voice and text capabilities.

GPT-Bidi-1 is a bidirectional audio model for ChatGPT's voice mode designed to listen and speak at once, absorb interruptions, and adjust mid-sentence.
Original Article
View Cached Full Text

Cached at: 06/18/26, 12:54 AM

# OpenAI prepares major ChatGPT voice upgrade with GPT-Bidi-1 Source: [https://www.testingcatalog.com/openai-prepares-major-chatgpt-voice-upgrade-with-gpt-bidi-1/](https://www.testingcatalog.com/openai-prepares-major-chatgpt-voice-upgrade-with-gpt-bidi-1/) [![Google Preferred Source](https://www.testingcatalog.com/assets/images/google_preferred_source_badge_light_en.png?v=7f552fc3f6)](https://google.com/preferences/source?q=testingcatalog.com) OpenAI looks set to give ChatGPT's voice mode its biggest upgrade in months, with preparations underway for a next\-generation audio model tentatively tagged GPT\-Bidi\-1\. The name points to the bidirectional, or "BiDi," architecture the company has been building since early this year, a model designed to listen and speak at once, absorb interruptions, and adjust mid\-sentence rather than freezing the moment a user says "mm\-hm\." Signs of it now span web and mobile, suggesting a consumer rollout is near, though the name may shift before launch\. > — M1 \(@M1Astra\)[June 16, 2026](https://x.com/M1Astra/status/2067017773528617041?ref_src=twsrc%5Etfw&ref=testingcatalog.com) The wider point is less about voice quality than a gap OpenAI has let widen\. Its text models raced ahead to the GPT\-5\.5 generation while voice stayed on an older audio stack, leaving spoken conversations a step behind what the same assistant manages in writing\. Closing that gap matters for a company betting that speech, not text, becomes the main way people reach AI, the wager behind its planned audio\-first hardware and its voice\-based support tools\. GPT\-Bidi\-1 is built around that, promising smoother exchanges plus what is billed as a major jump in reasoning\. > 🚨 OpenAI is planning to release GPT\-Bidi\-1 very soon Their next\-generation voice model for more natural conversations \[Final naming of the model might change\] h/t to[@M1Astra](https://x.com/M1Astra?ref_src=twsrc%5Etfw&ref=testingcatalog.com)from DevMode[pic\.twitter\.com/brmD8bUgqb](https://t.co/brmD8bUgqb?ref=testingcatalog.com) — Chetaslua \(@chetaslua\)[June 16, 2026](https://x.com/chetaslua/status/2066917089504526658?ref_src=twsrc%5Etfw&ref=testingcatalog.com) The feature's shape is coming into focus\.[ChatGPT](https://www.testingcatalog.com/tag/chatgpt/)users would likely keep today's setup, toggling between a new Bidi \(Latest\) mode and the current Advanced Voice Mode rather than being moved over wholesale\. More telling is the choice of intelligence levels: High, Medium, and Instant, mirroring the tiers already offered on the text side and letting people trade speed for depth by task\. A recent change that lets the voice bubble be dragged to the middle of the screen reads as an early piece of the same redesign\. Caution is warranted on timing\. Whether that starts this week or later is unclear, but the groundwork is plainly being laid\.

Similar Articles

OpenAI prepares bidirectional voice mode for rollout on ChatGPT (2 minute read)

TLDR AI

OpenAI is rolling out a new bidirectional voice model (Bidi 1) for ChatGPT that allows simultaneous speaking, hearing, and listening, real-time translation, and improved conversation context handling. The upgrade is appearing in the web interface and app for some users, with a broader release expected soon.

ChatGPT can now see, hear, and speak

OpenAI Blog

OpenAI is rolling out new voice and image capabilities to ChatGPT Plus and Enterprise users, enabling users to have voice conversations and share images for multimodal interactions powered by GPT-3.5/GPT-4 and custom text-to-speech models.