@AlexGDimakis: I am very excited about this research: We show 2 things: 1. If you just do random sampling (i.e. you try to solve a pro…

X AI KOLs Timeline 06/16/26, 10:33 PM Papers

research ai-scaling coding-agents human-performance continual-learning comparative-study long-horizon

Summary

This research compares AI coding agents (like Claude-Code and Codex) with human expert coders on long-horizon tasks, showing that humans scale super-linearly due to continual learning while agents plateau, highlighting a key limitation of current AI in extended problem-solving.

I am very excited about this research: We show 2 things: 1. If you just do random sampling (i.e. you try to solve a problem k times independently, and keep the best) your ELO scaling will be linear in log(test-time-compute). Agents like Claude-Code and Codex scale like that after a few hours. 2. We compare human expert coders to coding agents on the same tasks (from AtCoder Heuristic Contest). The exciting finding is that humans scale super-linearly. This is evidence that humans do continual learning, while they are solving a problem! I.e. they learn more about the coding problem they are trying to solve and scale fundamentally better compared to randomly trying things in a memoryless fashion. This is empirical evidence that supports what many of us have felt for a while: unless we solve continual learning we will not be able to outperform humans in tasks that take many days. Current coding agents are not able to do this.

Original Article

View Cached Full Text

Cached at: 06/17/26, 03:47 AM

I am very excited about this research: We show 2 things:

If you just do random sampling (i.e. you try to solve a problem k times independently, and keep the best) your ELO scaling will be linear in log(test-time-compute). Agents like Claude-Code and Codex scale like that after a few hours.
We compare human expert coders to coding agents on the same tasks (from AtCoder Heuristic Contest). The exciting finding is that humans scale super-linearly. This is evidence that humans do continual learning, while they are solving a problem! I.e. they learn more about the coding problem they are trying to solve and scale fundamentally better compared to randomly trying things in a memoryless fashion.

This is empirical evidence that supports what many of us have felt for a while: unless we solve continual learning we will not be able to outperform humans in tasks that take many days. Current coding agents are not able to do this.

It’s Claude code so I can do whatever it wants. But it doesn’t.

It’s an easy proof: assume the performance in a task is a Gaussian zero mean with variance 1. Sample it k times. Random sampling is taking the max performance from k tries. Elo can be computed by the probability of k1 trials beating k2, ie the max of k1 Gaussians to be bigger than the max of k2 gaussians. Elo(k) = constant + log(k)

@AlexGDimakis: I am very excited about this research: We show 2 things: 1. If you just do random sampling (i.e. you try to solve a pro…

Similar Articles

@AnthropicAI: Our latest economic research introduces a framework for tracking Claude Code as it scales. Who is using Claude Code, an…

AI Coding Agents Can Reproduce Social Science Findings

Anyone else feel like AI agents are amazing right up until things get complicated?

@MaximeRivest: Coding agents can only accelerate our work when we are willing to accept that we may not fully understand the overly co…

@techwith_ram: https://x.com/techwith_ram/status/2064925285003542820

Submit Feedback

Similar Articles

@AnthropicAI: Our latest economic research introduces a framework for tracking Claude Code as it scales. Who is using Claude Code, an…

AI Coding Agents Can Reproduce Social Science Findings

Anyone else feel like AI agents are amazing right up until things get complicated?

@MaximeRivest: Coding agents can only accelerate our work when we are willing to accept that we may not fully understand the overly co…

@techwith_ram: https://x.com/techwith_ram/status/2064925285003542820