OpenAI describes a robot learning system powered by two neural networks — a vision network trained on simulated images and an imitation network that generalizes task demonstrations to new configurations. The system is applied to block-stacking tasks, learning to infer and replicate task intent from paired demonstration examples.
We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.
# Robots that learn
Source: [https://openai.com/index/robots-that-learn/](https://openai.com/index/robots-that-learn/)
The system is powered by two neural networks: a vision network and an imitation network\.
The vision network ingests an image from the robot’s camera and outputs state representing the positions of the objects\. As[before\(opens in a new window\)](https://blog.openai.com/spam-detection-in-the-physical-world/), the vision network is trained with hundreds of thousands of simulated images with different perturbations of lighting, textures, and objects\. \(The vision system is never trained on a real image\.\)
The imitation network observes a demonstration, processes it to infer the intent of the task, and then accomplishes the intent starting from another starting configuration\. Thus, the imitation network must generalize the demonstration to a new setting\. But how does the imitation network know how to generalize?
The network learns this from the distribution of training examples\. It is trained on dozens of different tasks with thousands of demonstrations for each task\. Each training example is a pair of demonstrations that perform the same task\. The network is given the entirety of the first demonstration and a single observation from the second demonstration\. We then use supervised learning to predict what action the demonstrator took at that observation\. In order to predict the action effectively, the robot must learn how to infer the relevant portion of the task from the first demonstration\.
Applied to block stacking, the training data consists of pairs of trajectories that stack blocks into a matching set of towers in the same order, but start from different start states\. In this way, the imitation network learns to match the demonstrator’s ordering of blocks and size of towers without worrying about the relative location of the towers\.
OpenAI proposes a meta-learning framework for one-shot imitation learning that enables robots to learn new tasks from a single demonstration and generalize to new instances without task-specific engineering. The approach uses soft attention mechanisms to allow neural networks trained on diverse task pairs to perform well on unseen tasks at test time.
OpenAI hosted its first Robotics Symposium on April 27, 2019, bringing together robotics and machine learning experts to discuss learning robots and demonstrate their humanoid robot hand solving manipulation tasks using vision and reinforcement learning.
OpenAI releases Roboschool, an open-source robot simulation environment integrated with OpenAI Gym featuring twelve environments including enhanced humanoid locomotion tasks and multi-agent settings like Pong.
AI coding agents using the open-source ENPIRE framework can autonomously train robots to perform tasks like installing GPUs and cutting zip-ties, with the system self-improving overnight.
OpenAI presents Hindsight Experience Replay (HER), a reinforcement learning technique that enables robots to learn from failed attempts by retroactively treating achieved alternative outcomes as successful goals, allowing learning even with sparse reward signals.