Solving Rubik’s Cube with a robot hand

OpenAI Blog News

Summary

OpenAI developed a robot hand capable of solving a Rubik's Cube using a novel technique called Automatic Domain Randomization (ADR), which progressively increases simulation difficulty to enable effective transfer of learned behaviors from simulation to the real world.

We’ve trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand. The neural networks are trained entirely in simulation, using the same reinforcement learning code as OpenAI Five paired with a new technique called Automatic Domain Randomization (ADR). The system can handle situations it never saw during training, such as being prodded by a stuffed giraffe. This shows that reinforcement learning isn’t just a tool for virtual tasks, but can solve physical-world problems requiring unprecedented dexterity.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:55 PM

# Solving Rubik’s Cube with a robot hand Source: [https://openai.com/index/solving-rubiks-cube/](https://openai.com/index/solving-rubiks-cube/) The biggest challenge we faced was to create environments in simulation diverse enough to capture the physics of the real world\. Factors like friction, elasticity and dynamics are incredibly difficult to measure and model for objects as complex as Rubik’s Cubes or robotic hands and we found that domain randomization alone is not enough\. To overcome this, we developed a new method called*Automatic Domain Randomization*\(ADR\), which endlessly generates progressively more difficult environments in simulation\.[B](https://openai.com/index/solving-rubiks-cube/#citation-bottom-B)This frees us from having an accurate model of the real world, and enables the transfer of neural networks learned in simulation to be applied to the real world\. ADR starts with a single, nonrandomized environment, wherein a neural network learns to solve Rubik’s Cube\. As the neural network gets better at the task and reaches a performance threshold, the amount of domain randomization is increased automatically\. This makes the task harder, since the neural network must now learn to generalize to more randomized environments\. The network keeps learning until it again exceeds the performance threshold, when more randomization kicks in, and the process is repeated\.

Similar Articles

Learning dexterity

OpenAI Blog

OpenAI announces Dactyl, a system that learns robotic hand dexterity through simulation and reinforcement learning, using LSTMs to generalize across different physical environments and the Rapid PPO implementation to train policies that transfer to real-world manipulation tasks.

Domain randomization and generative models for robotic grasping

OpenAI Blog

Researchers explore a data generation pipeline using domain randomization and procedurally generated objects to train a deep neural network for robotic grasp planning. The proposed autoregressive model achieves >90% success on unseen objects in simulation and 80% in the real world, despite being trained only on random simulated objects.

Sim-to-real transfer of robotic control with dynamics randomization

OpenAI Blog

OpenAI researchers demonstrate a method to bridge the reality gap in robotic control by training policies with randomized simulator dynamics, enabling robots trained purely in simulation to successfully transfer to real-world tasks like object manipulation without physical training.

OpenAI Robotics Symposium 2019

OpenAI Blog

OpenAI hosted its first Robotics Symposium on April 27, 2019, bringing together robotics and machine learning experts to discuss learning robots and demonstrate their humanoid robot hand solving manipulation tasks using vision and reinforcement learning.

Generalizing from simulation

OpenAI Blog

OpenAI describes challenges with conventional RL on robotics tasks and introduces Hindsight Experience Replay (HER), a new RL algorithm that enables agents to learn from binary rewards by reframing failures as intended outcomes, combined with domain randomization for sim-to-real transfer.