OpenAI Five

OpenAI Blog 06/25/18, 07:00 AM Models

reinforcement-learning dota-2 multi-agent game-playing self-play deep-learning

Summary

OpenAI Five is a reinforcement learning agent that masters Dota 2 through self-play training with curriculum learning and strategic randomization, progressing from random behavior to executing complex human-level strategies.

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 02:45 PM

# OpenAI Five Source: [https://openai.com/index/openai-five/](https://openai.com/index/openai-five/) Given a learning algorithm capable of handling long horizons, we still need to explore the environment\. Even with our[restrictions⁠](https://openai.com/index/openai-five/#restricted), there are hundreds of items, dozens of buildings, spells, and unit types, and a long tail of game mechanics to learn about—many of which yield powerful combinations\. It’s not easy to explore this combinatorially\-vast space efficiently\. OpenAI Five learns from self\-play \(starting from random weights\), which provides a natural curriculum for exploring the environment\. To avoid “strategy collapse”, the agent trains 80% of its games against itself and the other 20% against its past selves\. In the first games, the heroes walk aimlessly around the map\. After several hours of training, concepts such as[laning⁠\(opens in a new window\)](https://www.reddit.com/r/DotA2/comments/17fj2y/laning_101/),[farming⁠\(opens in a new window\)](https://dota2.gamepedia.com/Farming), or fighting over[mid⁠\(opens in a new window\)](https://pvgna.com/dota2/paths/how-to-master-mid-lane)emerge\. After several days, they consistently adopt basic human strategies: attempt to steal[Bounty⁠\(opens in a new window\)](https://dota2.gamepedia.com/Bounty_Rune)runes from their opponents, walk to their[tier one⁠\(opens in a new window\)](https://dota2.gamepedia.com/Buildings#Towers)towers to farm, and rotate heroes around the map to gain lane advantage\. And with further training, they become proficient at high\-level strategies like[5\-hero push⁠\(opens in a new window\)](https://www.reddit.com/r/DotA2/comments/4iyr00/how_do_you_counter_a_5man_early_game_push_strat/)\. In March 2017, our first[agent⁠\(opens in a new window\)](https://www.youtube.com/watch?v=5Fv2c4aNS2w&feature=youtu.be)defeated bots but got confused against humans\. To force exploration in strategy space, during training \(and only during training\) we randomized the properties \(health, speed, start level, etc\.\) of the units, and it began beating humans\. Later on, when a test player was consistently beating our 1v1 bot, we increased our training randomizations and the test player started to lose\. \(Our robotics team concurrently applied similar randomization techniques to[physical⁠](https://openai.com/index/generalizing-from-simulation/)[robots⁠](https://openai.com/index/spam-detection-in-the-physical-world/)to transfer from simulation to the real world\.\) OpenAI Five uses the randomizations we wrote for our 1v1 bot\. It also uses a new “lane assignment” one\. At the beginning of each training game, we randomly “assign” each hero to some subset of[lanes⁠\(opens in a new window\)](https://dota2.gamepedia.com/Lane)and penalize it for straying from those lanes until a randomly\-chosen time in the game\. Exploration is also helped by a good reward\.[Our reward⁠\(opens in a new window\)](https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae93984a)consists mostly of metrics humans track to decide how they’re doing in the game: net worth, kills, deaths, assists, last hits, and the like\. We postprocess each agent’s reward by subtracting the other team’s average reward to prevent the agents from finding positive\-sum situations\. We hardcode item and skill builds \(originally written for our[scripted⁠](https://openai.com/index/more-on-dota-2/#infrastructure)baseline\), and choose which of the builds to use at random\.[Courier⁠\(opens in a new window\)](https://dota2.gamepedia.com/Courier)management is also imported from the scripted baseline\.

OpenAI Five

Similar Articles

Dota 2 with large scale deep reinforcement learning

OpenAI Five Benchmark

OpenAI Five Benchmark: Results

OpenAI Five defeats Dota 2 world champions

OpenAI Five Finals

Submit Feedback

Similar Articles

Dota 2 with large scale deep reinforcement learning

OpenAI Five Benchmark: Results

OpenAI Five defeats Dota 2 world champions