Tag
A research paper that combines a small amount of human demonstrations as a regularization objective with self-play reinforcement learning, enabling human-compatible driving policies using far less human data (30 minutes vs thousands of hours) and training in 15 hours on a single consumer GPU.
OpenAI co-organizes the MineRL 2020 Competition to advance sample-efficient reinforcement learning algorithms that leverage human demonstrations. Participants compete to obtain a diamond in Minecraft using only 8 million simulator samples and 4 days of single-GPU training, with access to a 60+ million frame human demonstration dataset.