Point-E: A system for generating 3D point clouds from complex prompts

OpenAI Blog Models

Summary

OpenAI introduces Point-E, a system for generating 3D point clouds from text prompts in 1-2 minutes on a single GPU by combining text-to-image and image-to-3D diffusion models. The method achieves significant speedup over prior methods while releasing pre-trained models and code.

No content available
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:46 PM

# Point-E: A system for generating 3D point clouds from complex prompts Source: [https://openai.com/index/point-e/](https://openai.com/index/point-e/) While recent work on text\-conditional 3D object generation has shown promising results, the state\-of\-the\-art methods typically require multiple GPU\-hours to produce a single sample\. This is in stark contrast to state\-of\-the\-art generative image models, which produce samples in a number of seconds or minutes\. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1\-2 minutes on a single GPU\. Our method first generates a single synthetic view using a text\-to\-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image\. While our method still falls short of the state\-of\-the\-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade\-off for some use cases\. We release our pre\-trained point cloud diffusion models, as well as evaluation code and models, at[this https URL⁠\(opens in a new window\)](https://github.com/openai/point-e)\.

Similar Articles

DALL·E 3 is now available in ChatGPT Plus and Enterprise

OpenAI Blog

OpenAI announces DALL·E 3 is now available in ChatGPT Plus and Enterprise, featuring improved image quality, better text/hand/face rendering, and enhanced prompt adherence through training on improved captions. The release includes multi-tiered safety systems, artist style protections, and an internal provenance classifier for detecting AI-generated images with >99% accuracy.