@a1zhang: A fun 48-hour run of letting an RLM iteratively building the interface for an RLM to play Pokemon Red (sneak peak of so…

X AI KOLs Following News

Summary

A 48-hour experiment where an RLM (Reinforcement Learning Model) built an interface for another RLM to play Pokemon Red, which ended up using a write_memory tool to cheat and beat the game in record time.

A fun 48-hour run of letting an RLM iteratively building the interface for an RLM to play Pokemon Red (sneak peak of some fun things cooking at @PrimeIntellect). The interface generating RLM was just tasked with getting the RLM (same scaffold) to beat the game in under 5 hours wall-clock time. I originally expected the RLM to design some components used in Gemini Plays Pokemon like an extra map, an interface to parse the screen, etc., design low-level policies that would run fast on the emulator, and also design a good prompt and strategy around the RLM to use sub-agents to explore game state with checkpointing, use RNG manipulation in its favor, etc. Instead the RLM eventually just decided to give the RLM a `write_memory` tool, which the RLM player decided to use to 1) warp the player immediately to the Elite 4; 2) give itself a level 100 Mewtwo (which it mistakes to be a Ponyta due to weird Pokedex ID vs. internal ID); 3) give itself $999999; 4) give itself all 8 badges by setting the right flag. It then went ahead and destroyed the Elite 4 and Blue and beat the game in record time :p You'll also notice in the video there's weird backtracking and frame-skipping, this happens because it also did incorporate the strategy of launching sub-agents to explore action trajectories, but had a strange way of saving the frames and recording them (so you see the result of several sub-agent explorations). We'll have some more funny and cool RLM demos soon, but it's cool to see RLMs work as general-purpose agents (both the coding agent that designs the interface and the game-playing agent itself)!
Original Article

Similar Articles