Transforming visual accessibility
Summary
OpenAI has partnered with Be My Eyes to use GPT-4 for visual accessibility, enabling blind and low-vision users to navigate websites, e-commerce platforms, and physical spaces through intelligent summarization and real-time guidance. The system leverages GPT-4's vision capabilities to identify important content and provide contextual assistance that mimics how sighted users naturally scan information.
View Cached Full Text
Cached at: 04/20/26, 02:55 PM
Similar Articles
Using AI coding tools to build a production braille 3D generator as a blind developer
A blind developer describes using AI coding tools to build a production web platform for generating 3D-printable braille objects. He emphasizes that AI compressed the implementation loop without replacing domain expertise, accessibility judgment, or lived experience, highlighting how AI can empower domain experts with limited engineering resources.
GPT-4V(ision) system card
OpenAI releases a system card detailing the safety properties and evaluations of GPT-4V(ision), which adds image input capabilities to GPT-4, enabling multimodal instruction-following and vision analysis.
Empowering teams to unlock insights faster at OpenAI
OpenAI has developed an internal research assistant that combines dashboards with a conversational GPT-5 interface to help teams analyze millions of support tickets and generate insights in minutes instead of weeks. The tool democratizes data analysis across teams, allowing non-technical users to ask questions in plain language and get actionable reports on product feedback, customer sentiment, and trends.
Introducing vision to the fine-tuning API
OpenAI introduces vision fine-tuning capabilities for GPT-4o, allowing developers to customize the model with image data in addition to text for improved performance on vision tasks like visual search, object detection, and medical image analysis.
ChatGPT can now see, hear, and speak
OpenAI is rolling out new voice and image capabilities to ChatGPT Plus and Enterprise users, enabling users to have voice conversations and share images for multimodal interactions powered by GPT-3.5/GPT-4 and custom text-to-speech models.