OpenAI Five

OpenAI Blog Models

Summary

OpenAI Five is a reinforcement learning agent that masters Dota 2 through self-play training with curriculum learning and strategic randomization, progressing from random behavior to executing complex human-level strategies.

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:45 PM

# OpenAI Five Source: [https://openai.com/index/openai-five/](https://openai.com/index/openai-five/) Given a learning algorithm capable of handling long horizons, we still need to explore the environment\. Even with our[restrictions⁠](https://openai.com/index/openai-five/#restricted), there are hundreds of items, dozens of buildings, spells, and unit types, and a long tail of game mechanics to learn about—many of which yield powerful combinations\. It’s not easy to explore this combinatorially\-vast space efficiently\. OpenAI Five learns from self\-play \(starting from random weights\), which provides a natural curriculum for exploring the environment\. To avoid “strategy collapse”, the agent trains 80% of its games against itself and the other 20% against its past selves\. In the first games, the heroes walk aimlessly around the map\. After several hours of training, concepts such as[laning⁠\(opens in a new window\)](https://www.reddit.com/r/DotA2/comments/17fj2y/laning_101/),[farming⁠\(opens in a new window\)](https://dota2.gamepedia.com/Farming), or fighting over[mid⁠\(opens in a new window\)](https://pvgna.com/dota2/paths/how-to-master-mid-lane)emerge\. After several days, they consistently adopt basic human strategies: attempt to steal[Bounty⁠\(opens in a new window\)](https://dota2.gamepedia.com/Bounty_Rune)runes from their opponents, walk to their[tier one⁠\(opens in a new window\)](https://dota2.gamepedia.com/Buildings#Towers)towers to farm, and rotate heroes around the map to gain lane advantage\. And with further training, they become proficient at high\-level strategies like[5\-hero push⁠\(opens in a new window\)](https://www.reddit.com/r/DotA2/comments/4iyr00/how_do_you_counter_a_5man_early_game_push_strat/)\. In March 2017, our first[agent⁠\(opens in a new window\)](https://www.youtube.com/watch?v=5Fv2c4aNS2w&feature=youtu.be)defeated bots but got confused against humans\. To force exploration in strategy space, during training \(and only during training\) we randomized the properties \(health, speed, start level, etc\.\) of the units, and it began beating humans\. Later on, when a test player was consistently beating our 1v1 bot, we increased our training randomizations and the test player started to lose\. \(Our robotics team concurrently applied similar randomization techniques to[physical⁠](https://openai.com/index/generalizing-from-simulation/)[robots⁠](https://openai.com/index/spam-detection-in-the-physical-world/)to transfer from simulation to the real world\.\) OpenAI Five uses the randomizations we wrote for our 1v1 bot\. It also uses a new “lane assignment” one\. At the beginning of each training game, we randomly “assign” each hero to some subset of[lanes⁠\(opens in a new window\)](https://dota2.gamepedia.com/Lane)and penalize it for straying from those lanes until a randomly\-chosen time in the game\. Exploration is also helped by a good reward\.[Our reward⁠\(opens in a new window\)](https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae93984a)consists mostly of metrics humans track to decide how they’re doing in the game: net worth, kills, deaths, assists, last hits, and the like\. We postprocess each agent’s reward by subtracting the other team’s average reward to prevent the agents from finding positive\-sum situations\. We hardcode item and skill builds \(originally written for our[scripted⁠](https://openai.com/index/more-on-dota-2/#infrastructure)baseline\), and choose which of the builds to use at random\.[Courier⁠\(opens in a new window\)](https://dota2.gamepedia.com/Courier)management is also imported from the scripted baseline\.

Similar Articles

Dota 2 with large scale deep reinforcement learning

OpenAI Blog

OpenAI Five became the first AI system to defeat Dota 2 world champions using large-scale deep reinforcement learning with self-play, demonstrating superhuman performance on a complex game with long time horizons and imperfect information.

OpenAI Five Benchmark

OpenAI Blog

OpenAI Five completed a benchmark match against humans in Dota 2, demonstrating improved capabilities including expanded hero pool (18 heroes), Roshan pit mechanics, and wards. The system shows general training flexibility in acquiring complex game skills.

OpenAI Five Benchmark: Results

OpenAI Blog

OpenAI releases benchmark results for OpenAI Five, their Dota 2 playing system, detailing training methodology across six major revisions with compute requirements ranging from 8 to 35 petaflop/s-days and introducing new network architecture tooling.

OpenAI Five defeats Dota 2 world champions

OpenAI Blog

OpenAI Five becomes the first AI to defeat world-champion esports professionals in Dota 2, winning two back-to-back matches against OG at the OpenAI Five Finals. The breakthrough was achieved through unprecedented scaling of training compute rather than novel algorithms, and the team is retiring OpenAI Five while announcing plans to deploy it for public internet play.

OpenAI Five Finals

OpenAI Blog

OpenAI is hosting the OpenAI Five Finals live event on April 13 in the Bay Area, showcasing its Dota 2 AI to demonstrate AI competence, scalability, and human-AI collaboration. The event aims to help the public better understand AI progress and its future impact.