Learning dexterity

OpenAI Blog News

Summary

OpenAI announces Dactyl, a system that learns robotic hand dexterity through simulation and reinforcement learning, using LSTMs to generalize across different physical environments and the Rapid PPO implementation to train policies that transfer to real-world manipulation tasks.

We’ve trained a human-like robot hand to manipulate physical objects with unprecedented dexterity.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:46 PM

# Learning dexterity Source: [https://openai.com/index/learning-dexterity/](https://openai.com/index/learning-dexterity/) By building simulations that support transfer, we have reduced the problem of controlling a robot in the real world to accomplishing a task in simulation, which is a problem well\-suited for reinforcement learning\. While the task of manipulating an object in a simulated hand is already[somewhat difficult⁠](https://openai.com/index/ingredients-for-robotics-research/), learning to do so across all combinations of randomized physical parameters is substantially more difficult\. To generalize across environments, it is helpful for the policy to be able to take different actions in environments with different dynamics\. Because most dynamics parameters cannot be inferred from a single observation, we used an[LSTM⁠\(opens in a new window\)](http://colah.github.io/posts/2015-08-Understanding-LSTMs/#lstm-networks)—a type of neural network with memory—to make it possible for the network to learn about the dynamics of the environment\. The LSTM achieved about twice as many rotations in simulation as a policy without memory\. Dactyl learns using[Rapid⁠](https://openai.com/index/openai-five/#rapid), the massively scaled implementation of Proximal Policy Optimization developed to allow OpenAI Five to solve Dota 2\. We use a different model architecture, environment, and hyperparameters than OpenAI Five does, but we use the same algorithms and training code\. Rapid used 6144 CPU cores and 8 GPUs to train our policy, collecting about one hundred years of experience in 50 hours\. For development and testing, we validated our control policy against objects with embedded motion tracking sensors to isolate the performance of our control and vision networks\.

Similar Articles

Robots that learn

OpenAI Blog

OpenAI describes a robot learning system powered by two neural networks — a vision network trained on simulated images and an imitation network that generalizes task demonstrations to new configurations. The system is applied to block-stacking tasks, learning to infer and replicate task intent from paired demonstration examples.

Solving Rubik’s Cube with a robot hand

OpenAI Blog

OpenAI developed a robot hand capable of solving a Rubik's Cube using a novel technique called Automatic Domain Randomization (ADR), which progressively increases simulation difficulty to enable effective transfer of learned behaviors from simulation to the real world.

RLDX-1 Technical Report

Hugging Face Daily Papers

RLDX-1 is a general-purpose robotic policy for dexterous manipulation that uses a Multi-Stream Action Transformer architecture to integrate heterogeneous modalities, outperforming existing VLA models in real-world tasks.