Third-person imitation learning

OpenAI Blog Papers

Summary

OpenAI presents a method for unsupervised third-person imitation learning that enables agents to learn from demonstrations taken from different viewpoints without explicit state correspondence, using domain confusion techniques to learn viewpoint-agnostic features.

No content available
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:56 PM

# Third-person imitation learning Source: [https://openai.com/index/third-person-imitation-learning/](https://openai.com/index/third-person-imitation-learning/) ## Abstract Reinforcement learning \(RL\) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments\. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize\. Traditionally, imitation learning in RL has been used to overcome this problem\. Unfortunately, hitherto imitation learning methods tend to require that demonstrations are supplied in the first\-person: the agent is provided with a sequence of states and a specification of the actions that it should have taken\. While powerful, this kind of imitation learning is limited by the relatively hard problem of collecting first\-person demonstrations\. Humans address this problem by learning from third\-person demonstrations: they observe other humans perform tasks, infer the task, and accomplish the same task themselves\. In this paper, we present a method for unsupervised third\-person imitation learning\. Here third\-person refers to training an agent to correctly achieve a simple goal in a simple environment when it is provided a demonstration of a teacher achieving the same goal but from a different viewpoint; and unsupervised refers to the fact that the agent receives only these third\-person demonstrations, and is not provided a correspondence between teacher states and student states\. Our methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process\. To validate our approach, we report successful experiments on learning from third\-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum\.

Similar Articles

One-shot imitation learning

OpenAI Blog

OpenAI proposes a meta-learning framework for one-shot imitation learning that enables robots to learn new tasks from a single demonstration and generalize to new instances without task-specific engineering. The approach uses soft attention mechanisms to allow neural networks trained on diverse task pairs to perform well on unseen tasks at test time.

Learning to model other minds

OpenAI Blog

OpenAI and University of Oxford researchers present LOLA (Learning with Opponent-Learning Awareness), a reinforcement learning method that enables agents to model and account for the learning of other agents, discovering cooperative strategies in multi-agent games like the iterated prisoner's dilemma and coin game.

Robots that learn

OpenAI Blog

OpenAI describes a robot learning system powered by two neural networks — a vision network trained on simulated images and an imitation network that generalizes task demonstrations to new configurations. The system is applied to block-stacking tasks, learning to infer and replicate task intent from paired demonstration examples.

Learning from human preferences

OpenAI Blog

OpenAI presents a method for training AI agents using human preference feedback, where an agent learns reward functions from human comparisons of behavior trajectories and uses reinforcement learning to optimize for the inferred goals. The approach demonstrates strong sample efficiency, requiring less than 1000 bits of human feedback to train an agent to perform a backflip.

Asymmetric actor critic for image-based robot learning

OpenAI Blog

OpenAI proposes an asymmetric actor-critic method for robot learning that leverages full state observability in simulators to train policies that operate on partial observations (RGBD images), enabling effective sim-to-real transfer without real-world training data.