Tag
The article analyzes how AlphaZero's value predictions are shaped by self-play training data and noise, questioning whether they reliably estimate win chances against opponents with different play styles despite AlphaZero's strong empirical performance.
A weekly roundup of top AI research papers covering topics such as Conductor, HeavySkill, Horizon Generalization, synthetic computers, self-improving pretraining, and AlphaZero for Connect Four.
STRATAGEM is a new framework for improving reasoning transferability in language models by using game self-play with a Reasoning Transferability Coefficient and Reasoning Evolution Reward to reinforce abstract, domain-agnostic reasoning patterns over game-specific heuristics. Experiments show strong improvements on mathematical reasoning, general reasoning, and code generation benchmarks.
OpenAI introduced Video PreTraining (VPT), a semi-supervised method that trains neural networks to play Minecraft by learning from 70,000 hours of unlabeled human gameplay video combined with a small labeled dataset. The model learns complex sequential tasks using the native human interface (keyboard and mouse) and demonstrates capabilities like crafting diamond tools and pillar jumping, representing progress toward general computer-using agents.
OpenAI Five became the first AI system to defeat Dota 2 world champions using large-scale deep reinforcement learning with self-play, demonstrating superhuman performance on a complex game with long time horizons and imperfect information.
OpenAI Five completed a benchmark match against humans in Dota 2, demonstrating improved capabilities including expanded hero pool (18 heroes), Roshan pit mechanics, and wards. The system shows general training flexibility in acquiring complex game skills.
OpenAI describes iterative improvements to their Dota 2 bot during The International tournament, combining coaching with self-play to enhance agent performance through rapid training cycles and strategic refinements discovered during professional matches.
OpenAI created a bot that defeats world-class Dota 2 professionals in 1v1 matches using only self-play learning, without imitation learning or tree search. The achievement demonstrates progress toward AI systems that can accomplish complex goals in dynamic, multi-agent environments.
Google DeepMind's Project Genie is a unified world model that generates and interacts with diverse video games by treating them as conditional video prediction tasks.