Tag
The author will speak at aiDotEngineer about using speedruns like nanogpt to evaluate AI research capabilities.
The Recursive team released an automated AI research system that can autonomously complete the research loop, surpassing existing human community solutions on multiple benchmarks. For example, on NanoGPT Speedrun it compressed training time from 79.7 seconds to 77.5 seconds, and on SOL-ExecBench it improved the score to 0.754.
AI agents (Opus 4.7 and GPT 5.5/Codex) autonomously optimized the nanoGPT speedrun optimizer, beating the human baseline with a new record of 2930 steps. The blog details their search methods, failures, and releases all run data and code.