Learning dexterity

OpenAI Blog 07/30/18, 07:00 AM News

robotics reinforcement-learning simulation dexterity hand-manipulation transfer-learning

Summary

OpenAI announces Dactyl, a system that learns robotic hand dexterity through simulation and reinforcement learning, using LSTMs to generalize across different physical environments and the Rapid PPO implementation to train policies that transfer to real-world manipulation tasks.

We’ve trained a human-like robot hand to manipulate physical objects with unprecedented dexterity.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 02:46 PM

# Learning dexterity Source: [https://openai.com/index/learning-dexterity/](https://openai.com/index/learning-dexterity/) By building simulations that support transfer, we have reduced the problem of controlling a robot in the real world to accomplishing a task in simulation, which is a problem well\-suited for reinforcement learning\. While the task of manipulating an object in a simulated hand is already[somewhat difficult⁠](https://openai.com/index/ingredients-for-robotics-research/), learning to do so across all combinations of randomized physical parameters is substantially more difficult\. To generalize across environments, it is helpful for the policy to be able to take different actions in environments with different dynamics\. Because most dynamics parameters cannot be inferred from a single observation, we used an[LSTM⁠\(opens in a new window\)](http://colah.github.io/posts/2015-08-Understanding-LSTMs/#lstm-networks)—a type of neural network with memory—to make it possible for the network to learn about the dynamics of the environment\. The LSTM achieved about twice as many rotations in simulation as a policy without memory\. Dactyl learns using[Rapid⁠](https://openai.com/index/openai-five/#rapid), the massively scaled implementation of Proximal Policy Optimization developed to allow OpenAI Five to solve Dota 2\. We use a different model architecture, environment, and hyperparameters than OpenAI Five does, but we use the same algorithms and training code\. Rapid used 6144 CPU cores and 8 GPUs to train our policy, collecting about one hundred years of experience in 50 hours\. For development and testing, we validated our control policy against objects with embedded motion tracking sensors to isolate the performance of our control and vision networks\.

Learning dexterity

Similar Articles

Robots that learn

Solving Rubik’s Cube with a robot hand

RLDX-1 Technical Report

Multi-Goal Reinforcement Learning: Challenging robotics environments and request for research

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Submit Feedback

Similar Articles

Solving Rubik’s Cube with a robot hand

Multi-Goal Reinforcement Learning: Challenging robotics environments and request for research

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation