Tag
browser_use demonstrates their v4 AI agent autonomously playing the online game powerline.io by analyzing the game state and creating a real-time subagent to compete for first place.
MTG Bench evaluates how well LLMs can play Magic: The Gathering using an MCP server for library operations, showing both successes and failures in complex game actions.
Claude Code uses a vision-based UI interaction model to play the game OSU! at 50ms per action, outperforming the human user without relying on an accessibility tree.
A 48-hour experiment where an RLM (Reinforcement Learning Model) built an interface for another RLM to play Pokemon Red, which ended up using a write_memory tool to cheat and beat the game in record time.
Paul Buchheit highlights the surprising zero-shot capability of modern seq2seq models to generate CLI commands and Python programs to play Doom using computer vision libraries without specific training on that task.
Google DeepMind and Kaggle introduced Kaggle Game Arena, an open-source AI benchmarking platform where large language models compete head-to-head in strategic games to provide dynamic and verifiable evaluation of their capabilities. The platform addresses limitations of traditional benchmarks by offering clear winning conditions and unambiguous performance signals.
OpenAI Five competed against top professional Dota 2 teams at The International 2018, losing both matches against elite human players while demonstrating competitive gameplay and strategic depth developed through self-taught learning.
OpenAI Five is a reinforcement learning agent that masters Dota 2 through self-play training with curriculum learning and strategic randomization, progressing from random behavior to executing complex human-level strategies.