@sitinme: Saw Karpathy open-sourced a very interesting project autoresearch, which gives a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on a single NVIDIA…

X AI KOLs Timeline 05/21/26, 03:38 AM Tools

ai-agent autonomous-research open-source llm-training karpathy nanochat gpu

Summary

Karpathy open-sourced an experimental project, autoresearch, that lets an AI Agent automatically complete the research loop for small-scale LLM training: modify code, run experiments, evaluate results, and iterate. Humans only need to write the research plan and constraints.

Saw Karpathy open-sourced a very interesting project called autoresearch, which hands a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on the nanochat training pipeline on a single NVIDIA GPU. Each time, the Agent modifies the training code, runs for about 5 minutes, and checks if the validation metrics improve. If they improve, it keeps the changes; if not, it discards them, and then proceeds to the next round. In other words, you hand the task to the Agent at night, and when you wake up the next day, you see a series of automated experiment logs, as well as a model that may have been optimized. The project is deliberately kept very small, with just a few files: http://prepare.py handles data preparation and utility functions, http://train.py is the training code that the Agent actually modifies, and program.md serves as the research instructions for the Agent. Humans are more involved in writing the "research plan" and constraints, rather than manually tweaking model architecture, hyperparameters, and training logic bit by bit as before. It demonstrates a new way of doing AI research: not just having AI help you write code, but having AI participate in the full experimental loop. It proposes changes, executes experiments, evaluates results, and iterates continuously. This is a bit like putting a junior researcher into a controlled environment and letting them experiment through trial and error. Currently, autoresearch is still a rather experimental project, and by default it is better suited for a single NVIDIA GPU, especially high-performance cards like the H100. For Mac, Windows, or AMD platforms, some people have already forked versions.

Original Article

View Cached Full Text

Cached at: 05/21/26, 09:39 PM

Karpathy open sourced a very interesting project called autoresearch, which hands over a real but small LLM training task to an AI Agent. The Agent does its own research, modifies code, runs experiments, checks results, and then decides whether to keep or discard the changes.

The project is based on the nanochat training pipeline on a single NVIDIA GPU. Each time, the Agent modifies the training code, runs it for about 5 minutes, and checks whether the validation metrics improve.

If they improve, the change is kept; if not, it’s discarded, and the next round begins. In other words, you hand the task to the Agent before going to bed, and when you wake up, you’ll see a series of automated experiment logs and possibly an optimized model.

The project is intentionally small, consisting mainly of a few files: prepare.py handles data preparation and utility functions, train.py is the training code that the Agent actually modifies, and program.md serves as the research instructions for the Agent.

Humans mostly write “research plans” and constraints, rather than manually tweaking model architectures, hyperparameters, and training logic bit by bit as before.

It demonstrates a new way of doing AI research: not just having AI write code for you, but involving AI in the full experimental loop.

It proposes changes, executes experiments, evaluates results, and iterates continuously. It’s a bit like putting a junior researcher in a controlled environment and letting them trial and error.

Currently, autoresearch is still an experimental project. By default, it’s better suited for a single NVIDIA GPU, especially high-performance cards like H100. For Mac, Windows, or AMD platforms, some fork versions have already been made.

Similar Articles

@WWTLitee: Is there a way for AI to autonomously iterate and optimize? Yes, check out autoresearch. Its core isn't to have AI directly 'invent papers,' but to break the research process into a verifiable loop: humans write program.md to give research direction, AI agent modifies http://tra…

@smallnest: I ported @karpathy's autoresearch to automated software development, and after various optimizations, the results are phenomenal.

@omarsar0: Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems.…

@lftherios: 1/ Autoresearch from @karpathy has been one of the most interesting agentic patterns to emerge this year. The challenge…

Submit Feedback

Similar Articles

@WWTLitee: Is there a way for AI to autonomously iterate and optimize? Yes, check out autoresearch. Its core isn't to have AI directly 'invent papers,' but to break the research process into a verifiable loop: humans write program.md to give research direction, AI agent modifies http://tra…

@jakevin7: Sharing something interesting Maka is currently working on: letting agents automatically optimize their own system prompt, fully closed-loop, without any human intervention. Karpathy's autoresearch, AEGIS, etc. have explored similar directions—a goal-driven self-reinforcement learning system.

@smallnest: I ported @karpathy's autoresearch to automated software development, and after various optimizations, the results are phenomenal.

@omarsar0: Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems.…

@lftherios: 1/ Autoresearch from @karpathy has been one of the most interesting agentic patterns to emerge this year. The challenge…