Tag
OpenAI is rolling out a new bidirectional voice model (Bidi 1) for ChatGPT that allows simultaneous speaking, hearing, and listening, real-time translation, and improved conversation context handling. The upgrade is appearing in the web interface and app for some users, with a broader release expected soon.
Anthropic is extending its Cowork agentic system to mobile apps, with evidence of cloud-based execution to remove the need for a desktop machine to remain awake, along with groundwork for a voice model refresh.
A user asks which upcoming OpenAI feature is more exciting: the rumored GPT-5.6 model or bidirectional voice mode (BiDi), which allows real-time simultaneous listening and speaking.
OpenAI is preparing to release GPT-Bidi-1, a next-generation voice model for ChatGPT that supports bidirectional communication, interruptions, and mid-sentence adjustments, aiming to close the gap between voice and text capabilities.
A prediction that GPT 5.6 will focus on autonomy features like longer tasks, better computer use, and stronger agents rather than just smarter answers.
A Wired article offers seven practical tips for becoming an AI power user, from using AI agents like Codex and Claude Cowork to adopting voice input, emphasizing adaptability and productivity gains.
A Windows desktop app called SEELS that allows users to run local LLMs, correct model responses, and automatically train LoRA adapters from corrections. It bundles voice mode, hardware dashboard, and a training sidecar.
A user successfully integrated Claude in voice mode into a Zoom meeting, where it answered questions from multiple participants without glitches, and is seeking ideas for applications of this capability.
Meta launches Muse Spark, a foundational AI model powering Meta AI with voice conversations, shopping assistance, and real-time visual recognition across apps and smart glasses. The rollout starts in US and Canada, with expansion to Ray-Ban Meta and Oakley Meta glasses.
ChatGPT's voice mode runs on a weaker GPT-4o era model with an April 2024 knowledge cutoff, significantly older than OpenAI's latest capabilities. The article highlights a growing gap between OpenAI's consumer voice interface and its more advanced paid models, driven by differences in reward signal clarity and B2B market incentives.
OpenAI shares a human-interest story about Adam, a fishing content creator, who used ChatGPT's Advanced Voice Mode to get halibut-catching advice after previous unsuccessful attempts, demonstrating a practical real-world application of AI in niche hobbies.
OpenAI explains its process for selecting five distinct voices for ChatGPT's Voice Mode feature, involving professional voice actors, casting directors, and a five-month selection process. The company addresses controversy over the 'Sky' voice, clarifying it is not an imitation of Scarlett Johansson and was cast before any outreach to her.
OpenAI is launching GPT-4o, its newest flagship model with improved text, voice, and vision capabilities, and making it available to free ChatGPT users with usage limits. They are also releasing a new macOS desktop app and redesigned UI for ChatGPT.