@sitinme: Saw Karpathy open-sourced a very interesting project autoresearch, which gives a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on a single NVIDIA…
Summary
Karpathy open-sourced an experimental project, autoresearch, that lets an AI Agent automatically complete the research loop for small-scale LLM training: modify code, run experiments, evaluate results, and iterate. Humans only need to write the research plan and constraints.
View Cached Full Text
Cached at: 05/21/26, 09:39 PM
Karpathy open sourced a very interesting project called autoresearch, which hands over a real but small LLM training task to an AI Agent. The Agent does its own research, modifies code, runs experiments, checks results, and then decides whether to keep or discard the changes.
The project is based on the nanochat training pipeline on a single NVIDIA GPU. Each time, the Agent modifies the training code, runs it for about 5 minutes, and checks whether the validation metrics improve.
If they improve, the change is kept; if not, it’s discarded, and the next round begins. In other words, you hand the task to the Agent before going to bed, and when you wake up, you’ll see a series of automated experiment logs and possibly an optimized model.
The project is intentionally small, consisting mainly of a few files: prepare.py handles data preparation and utility functions, train.py is the training code that the Agent actually modifies, and program.md serves as the research instructions for the Agent.
Humans mostly write “research plans” and constraints, rather than manually tweaking model architectures, hyperparameters, and training logic bit by bit as before.
It demonstrates a new way of doing AI research: not just having AI write code for you, but involving AI in the full experimental loop.
It proposes changes, executes experiments, evaluates results, and iterates continuously. It’s a bit like putting a junior researcher in a controlled environment and letting them trial and error.
Currently, autoresearch is still an experimental project. By default, it’s better suited for a single NVIDIA GPU, especially high-performance cards like H100. For Mac, Windows, or AMD platforms, some fork versions have already been made.
Similar Articles
@WWTLitee: Is there a way for AI to autonomously iterate and optimize? Yes, check out autoresearch. Its core isn't to have AI directly 'invent papers,' but to break the research process into a verifiable loop: humans write program.md to give research direction, AI agent modifies http://tra…
Introduces the autoresearch project, which breaks down the AI research process into a verifiable loop (fixed environment, single editable file, fixed metric, Git rollback), enabling AI agents to perform controllable and reproducible experiment iterations; also mentions the 12-factor-agents checklist.
@smallnest: I ported @karpathy's autoresearch to automated software development, and after various optimizations, the results are phenomenal.
A developer adapted Karpathy's autoresearch framework for automated software engineering, implementing multiple optimizations that yielded remarkable results.
@omarsar0: Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems.…
Karpathy's autoresearch repository has sparked a trend where agents train AI models to build state-of-the-art agentic systems, highlighting current limitations in LLM-driven hypothesis generation.
@lftherios: 1/ Autoresearch from @karpathy has been one of the most interesting agentic patterns to emerge this year. The challenge…
Andrej Karpathy's autoresearch pattern highlights how current AI agents run experiments in isolation, wasting compute by duplicating work and rediscovering dead ends.
@yaohui12138: Karpathy released a GitHub open-source project that truly amazed me. The project is called andrej-karpathy-skills, with 130k+ stars on GitHub. I'd call it the most useful AI engineering project of 2026. The problem it solves is extremely precise: making Cl…
Karpathy released an open-source project called andrej-karpathy-skills, centered around a 4KB CLAUDE.md file containing 4 behavioral guidelines (Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution). It significantly reduces AI coding error rates (up to 90%), improving code quality and development efficiency.