A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.
Summary
A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.
View Cached Full Text
Cached at: 06/23/26, 02:35 PM
Training Agents Class 1: SFT, run by an agent
We let an agent run the whole SFT workflow, live, from one prompt.
NOTE: You can (and should) watch the full live stream, where everything is covered in much more detail: https://www.youtube.com/watch?v=rNgUoH7Wbv8
We pasted a single prompt at the start of the stream and that was all we wrote, no training code by hand. From there, an AI agent did the engineering. The word agent means a few different things here. So before anything else, let’s be clear which one we mean.
Which agent is which?
There are two distinct agents in this story:
-
The builder: the agent that does the ML work. We used Codex, but any capable agent works. We gave it one prompt and it resolved the model, prepared the data, ran the training, tracked it, ran the evals, and wrote the model card.
-
The student (the one we train!): a small open Gemma model (gemma 4 2b) that learns to act like a coding agent.
The data comes from a third one: pi, a real coding agent. Its actual work sessions, the traces, are what the student imitates.
So one agent builds, one is being built, and a third gave the lessons.
Why does any of this make sense?
Why train a model to become a coding agent? Capable coding agents today run on large, expensive, often closed models. Teach a small open model to act like one and you get something cheap, private, and yours. And because you train it on your own traces, you can specialize it for a specific use case: your codebase, your tools, your workflow. It is also where you start. You cannot improve an agent that cannot yet act like one, so you teach it the format first.
Why Gemma 4, and why 2B? Gemma 4 is a recent, open, well-supported instruct model. The 2B size is small enough to train fast, run on a modest GPU, and iterate on live. We want you to see the mechanics and the workflow, not to top a benchmark. A small model also keeps the limits visible.
What should you expect as output? A LoRA adapter and a final model repo that imitate the agent’s format: its tool calls and its multi-turn loop. It learns the shape and language of the agent, not strong problem-solving. A 2B after one SFT pass is not going to be a great coder. You get a model that acts like the agent, and a pipeline that is reproducible and auditable.
Where can you find everything?
Every artifact is open (of course!):
-
Full stream: https://www.youtube.com/watch?v=rNgUoH7Wbv8
-
Slides: https://docs.google.com/presentation/d/1hcGZ4U9TjZZzcGNbH2K6wYD45qwZTyo_gosCQsnHlnc
-
SFT from scratch, by Ben: https://x.com/ben_burtenshaw/status/2067615361428545566
-
Full agent session trace: https://huggingface.co/buckets/burtenshaw/sft-on-traces/tree/example.jsonl
-
Trackio dashboard: https://huggingface.co/spaces/burtenshaw/youtube-livestream-1-trackio
-
Final model: https://huggingface.co/burtenshaw/gemma-4-E2B-it-pi-mono-lora-youtube-livestream-1
-
Winner adapter: https://huggingface.co/burtenshaw/gemma-4-E2B-it-pi-mono-lora-youtube-livestream-1-lr2e4-r16-len4k
-
Dataset (pi-mono traces): https://huggingface.co/datasets/badlogicgames/pi-mono
-
Code and context repo: https://github.com/burtenshaw/training-agents
Tools, mostly Hugging Face 🤗: TRL, Hugging Face Jobs, Trackio, and the Hub, plus Inspect AI and vLLM for the evals.
The agent did the build, but the judgment stayed ours: the goal, the constraints, the selection rule we fixed before any scores, and checking every artifact is real.
This was the recap. The class itself is the stream, that is where it happens. The series goes deeper from here.
Similar Articles
@SergioPaniego: we let an agent train a coding agent, live, from one prompt which agent is which, why it makes sense, and every artifac…
A live demonstration of an AI agent training a coding agent from a single prompt, with all artifacts recapped.
@RoundtableSpace: GITHUB JUST OPEN SOURCED A SYSTEM THAT FORCES AI AGENTS TO WRITE FULL SPECS BEFORE CODING 95K STARS IN DAYS
GitHub open-sourced a system that forces AI agents to write full specifications before coding, quickly garnering 95,000 stars.
@Av1dlive: 2 OpenAI engineers just gave a masterclass on how to build and ship apps using Codex they spent 16 mins on how Codex tu…
OpenAI engineers showcase Codex as an agent harness for software engineering, capable of reviewing code, splitting work across sub-agents, and running workflows autonomously, effectively turning one person into a full engineering team.
@sharbel: Someone built a free collection of production-grade engineering skills that teaches your AI coding agent to work exactl…
Agent Skills is a free, open-source collection of production-grade engineering skills that teaches AI coding agents to follow senior engineer workflows, including spec-first, atomic builds, and quality gates, compatible with Claude Code, Codex, Cursor, and Gemini CLI.
@SergioPaniego: https://x.com/SergioPaniego/status/2066498136273531363
This post demonstrates how to fine-tune a model for free using a single prompt, leveraging the new Google Colab CLI along with Hugging Face's TRL and trackio tools, all orchestrated by an AI agent.