Tag
This paper develops a formal account of what generalist agents must store in memory to act near-optimally across multiple environments and goals, presenting a separation theorem that memory is necessary for domain disambiguation and transition-model reconstruction.
This paper explores whether generalist coding agents (Claude Code, Codex, etc.) can automate data curation loops, achieving published baselines within 10 iterations but revealing a gap in exploring new methods. A scaffold that forces agents to adapt prior research yields policies that beat baselines using 10x less data.