@sitinme: Saw Karpathy open-sourced a very interesting project autoresearch, which gives a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on a single NVIDIA…

X AI KOLs Timeline Tools

Summary

Karpathy open-sourced an experimental project, autoresearch, that lets an AI Agent automatically complete the research loop for small-scale LLM training: modify code, run experiments, evaluate results, and iterate. Humans only need to write the research plan and constraints.

Saw Karpathy open-sourced a very interesting project called autoresearch, which hands a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on the nanochat training pipeline on a single NVIDIA GPU. Each time, the Agent modifies the training code, runs for about 5 minutes, and checks if the validation metrics improve. If they improve, it keeps the changes; if not, it discards them, and then proceeds to the next round. In other words, you hand the task to the Agent at night, and when you wake up the next day, you see a series of automated experiment logs, as well as a model that may have been optimized. The project is deliberately kept very small, with just a few files: http://prepare.py handles data preparation and utility functions, http://train.py is the training code that the Agent actually modifies, and program.md serves as the research instructions for the Agent. Humans are more involved in writing the "research plan" and constraints, rather than manually tweaking model architecture, hyperparameters, and training logic bit by bit as before. It demonstrates a new way of doing AI research: not just having AI help you write code, but having AI participate in the full experimental loop. It proposes changes, executes experiments, evaluates results, and iterates continuously. This is a bit like putting a junior researcher into a controlled environment and letting them experiment through trial and error. Currently, autoresearch is still a rather experimental project, and by default it is better suited for a single NVIDIA GPU, especially high-performance cards like the H100. For Mac, Windows, or AMD platforms, some people have already forked versions.
Original Article
View Cached Full Text

Cached at: 05/21/26, 09:39 PM

Karpathy open sourced a very interesting project called autoresearch, which hands over a real but small LLM training task to an AI Agent. The Agent does its own research, modifies code, runs experiments, checks results, and then decides whether to keep or discard the changes.

The project is based on the nanochat training pipeline on a single NVIDIA GPU. Each time, the Agent modifies the training code, runs it for about 5 minutes, and checks whether the validation metrics improve.

If they improve, the change is kept; if not, it’s discarded, and the next round begins. In other words, you hand the task to the Agent before going to bed, and when you wake up, you’ll see a series of automated experiment logs and possibly an optimized model.

The project is intentionally small, consisting mainly of a few files: prepare.py handles data preparation and utility functions, train.py is the training code that the Agent actually modifies, and program.md serves as the research instructions for the Agent.

Humans mostly write “research plans” and constraints, rather than manually tweaking model architectures, hyperparameters, and training logic bit by bit as before.

It demonstrates a new way of doing AI research: not just having AI write code for you, but involving AI in the full experimental loop.

It proposes changes, executes experiments, evaluates results, and iterates continuously. It’s a bit like putting a junior researcher in a controlled environment and letting them trial and error.

Currently, autoresearch is still an experimental project. By default, it’s better suited for a single NVIDIA GPU, especially high-performance cards like H100. For Mac, Windows, or AMD platforms, some fork versions have already been made.

Similar Articles

@WWTLitee: Is there a way for AI to autonomously iterate and optimize? Yes, check out autoresearch. Its core isn't to have AI directly 'invent papers,' but to break the research process into a verifiable loop: humans write program.md to give research direction, AI agent modifies http://tra…

X AI KOLs Timeline

Introduces the autoresearch project, which breaks down the AI research process into a verifiable loop (fixed environment, single editable file, fixed metric, Git rollback), enabling AI agents to perform controllable and reproducible experiment iterations; also mentions the 12-factor-agents checklist.

@yaohui12138: Karpathy released a GitHub open-source project that truly amazed me. The project is called andrej-karpathy-skills, with 130k+ stars on GitHub. I'd call it the most useful AI engineering project of 2026. The problem it solves is extremely precise: making Cl…

X AI KOLs Timeline

Karpathy released an open-source project called andrej-karpathy-skills, centered around a 4KB CLAUDE.md file containing 4 behavioral guidelines (Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution). It significantly reduces AI coding error rates (up to 90%), improving code quality and development efficiency.