@rohit4verse: Every night you're not running an autonomous research agent, you're hand-running experiments someone else automated mon…

X AI KOLs Timeline 05/25/26, 09:23 AM Tools

Summary

Andrej Karpathy open-sourced an autonomous research agent that runs its own ML experiments overnight using a single GPU, automatically iterating on improvements by editing code and keeping changes that lower validation loss.

Every night you're not running an autonomous research agent, you're hand-running experiments someone else automated months ago. Most people are still hunting for the "right" setup. Frameworks, orchestration, glue code. You don't need any of it. Andrej Karpathy open-sourced his own version that runs its own ML research. One GPU. ~100 experiments overnight. You never touch the Python. Here's the exact setup (takes 2 minutes): 1. Clone it: (repo link in comments) 2. uv sync, then uv run prepare[.]py 3. uv run train[.]py once to confirm the baseline runs 4. Point your coding agent at program.md and walk away The agent edits one file, trains 5 minutes, keeps the change if val_bpb drops, reverts it if it doesn't. Git is the memory. The metric is the judge. You wake up to a staircase of validated improvements, not a backlog of ideas you never tested.

Original Article

View Cached Full Text

Cached at: 05/25/26, 12:53 PM

Every night you’re not running an autonomous research agent, you’re hand-running experiments someone else automated months ago.

Most people are still hunting for the “right” setup. Frameworks, orchestration, glue code.

You don’t need any of it. Andrej Karpathy open-sourced his own version that runs its own ML research. One GPU. ~100 experiments overnight. You never touch the Python.

Here’s the exact setup (takes 2 minutes):

Clone it: (repo link in comments)
uv sync, then uv run prepare[.]py
uv run train[.]py once to confirm the baseline runs
Point your coding agent at program.md and walk away

The agent edits one file, trains 5 minutes, keeps the change if val_bpb drops, reverts it if it doesn’t. Git is the memory. The metric is the judge.

You wake up to a staircase of validated improvements, not a backlog of ideas you never tested.

@rohit4verse: Every night you're not running an autonomous research agent, you're hand-running experiments someone else automated mon…

Similar Articles

@lftherios: 1/ Autoresearch from @karpathy has been one of the most interesting agentic patterns to emerge this year. The challenge…

@JeremyNguyenPhD: "I left 3 AI agents alone with a research problem overnight. They came back with 72 peer-reviewed papers" -- @ProfJieDi…

@seelffff: > reads papers on arXiv autonomously > finds and checks datasets on HF Hub > writes the training script itself > genera…

@VukRosic99: A DeepSeek researcher just open-sourced his AutoResearch personal project. For the first time, the AutoResearch Agent a…

Submit Feedback

Similar Articles

@lftherios: 1/ Autoresearch from @karpathy has been one of the most interesting agentic patterns to emerge this year. The challenge…

@sitinme: Saw Karpathy open-sourced a very interesting project autoresearch, which gives a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on a single NVIDIA…

@JeremyNguyenPhD: "I left 3 AI agents alone with a research problem overnight. They came back with 72 peer-reviewed papers" -- @ProfJieDi…

@seelffff: > reads papers on arXiv autonomously > finds and checks datasets on HF Hub > writes the training script itself > genera…

@VukRosic99: A DeepSeek researcher just open-sourced his AutoResearch personal project. For the first time, the AutoResearch Agent a…