Tag
This paper explores whether generalist coding agents (Claude Code, Codex, etc.) can automate data curation loops, achieving published baselines within 10 iterations but revealing a gap in exploring new methods. A scaffold that forces agents to adapt prior research yields policies that beat baselines using 10x less data.
A new benchmark, FML-Bench, reveals that recent improvements in MLE-Bench scores are largely due to better base models and increased search budget rather than algorithmic advances.
Adaption launched AutoScientist, an AI tool that automates fine-tuning to help models learn capabilities quickly, aiming to make frontier AI training more accessible.
A new research paper introduces ASI-Arch, an autonomous AI system capable of discovering novel neural network architectures without human-designed search spaces. By running thousands of automated experiments, it generated over 100 new state-of-the-art linear attention models, signaling a major shift toward AI-driven scientific collaboration.