A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.

X AI KOLs Tools

training-agents sft supervised-fine-tuning coding-agent open-source gemma live-stream

Summary

A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.

https://t.co/TqIHNfRkfC

Original Article

View Cached Full Text

Cached at: 06/23/26, 02:35 PM

Training Agents Class 1: SFT, run by an agent

We let an agent run the whole SFT workflow, live, from one prompt.

NOTE: You can (and should) watch the full live stream, where everything is covered in much more detail: https://www.youtube.com/watch?v=rNgUoH7Wbv8

We pasted a single prompt at the start of the stream and that was all we wrote, no training code by hand. From there, an AI agent did the engineering. The word agent means a few different things here. So before anything else, let’s be clear which one we mean.

Which agent is which?

There are two distinct agents in this story:

The builder: the agent that does the ML work. We used Codex, but any capable agent works. We gave it one prompt and it resolved the model, prepared the data, ran the training, tracked it, ran the evals, and wrote the model card.
The student (the one we train!): a small open Gemma model (gemma 4 2b) that learns to act like a coding agent.

The data comes from a third one: pi, a real coding agent. Its actual work sessions, the traces, are what the student imitates.

So one agent builds, one is being built, and a third gave the lessons.

Why does any of this make sense?

Why train a model to become a coding agent? Capable coding agents today run on large, expensive, often closed models. Teach a small open model to act like one and you get something cheap, private, and yours. And because you train it on your own traces, you can specialize it for a specific use case: your codebase, your tools, your workflow. It is also where you start. You cannot improve an agent that cannot yet act like one, so you teach it the format first.

Why Gemma 4, and why 2B? Gemma 4 is a recent, open, well-supported instruct model. The 2B size is small enough to train fast, run on a modest GPU, and iterate on live. We want you to see the mechanics and the workflow, not to top a benchmark. A small model also keeps the limits visible.

What should you expect as output? A LoRA adapter and a final model repo that imitate the agent’s format: its tool calls and its multi-turn loop. It learns the shape and language of the agent, not strong problem-solving. A 2B after one SFT pass is not going to be a great coder. You get a model that acts like the agent, and a pipeline that is reproducible and auditable.

Where can you find everything?

Every artifact is open (of course!):

Full stream: https://www.youtube.com/watch?v=rNgUoH7Wbv8
Slides: https://docs.google.com/presentation/d/1hcGZ4U9TjZZzcGNbH2K6wYD45qwZTyo_gosCQsnHlnc
SFT from scratch, by Ben: https://x.com/ben_burtenshaw/status/2067615361428545566
Full agent session trace: https://huggingface.co/buckets/burtenshaw/sft-on-traces/tree/example.jsonl
Trackio dashboard: https://huggingface.co/spaces/burtenshaw/youtube-livestream-1-trackio
Final model: https://huggingface.co/burtenshaw/gemma-4-E2B-it-pi-mono-lora-youtube-livestream-1
Winner adapter: https://huggingface.co/burtenshaw/gemma-4-E2B-it-pi-mono-lora-youtube-livestream-1-lr2e4-r16-len4k
Dataset (pi-mono traces): https://huggingface.co/datasets/badlogicgames/pi-mono
Code and context repo: https://github.com/burtenshaw/training-agents

Tools, mostly Hugging Face 🤗: TRL, Hugging Face Jobs, Trackio, and the Hub, plus Inspect AI and vLLM for the evals.

The agent did the build, but the judgment stayed ours: the goal, the constraints, the selection rule we fixed before any scores, and checking every artifact is real.

This was the recap. The class itself is the stream, that is where it happens. The series goes deeper from here.

A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.

Training Agents Class 1: SFT, run by an agent

Which agent is which?

Why does any of this make sense?

Where can you find everything?

Similar Articles

@SergioPaniego: we let an agent train a coding agent, live, from one prompt which agent is which, why it makes sense, and every artifac…

@RoundtableSpace: GITHUB JUST OPEN SOURCED A SYSTEM THAT FORCES AI AGENTS TO WRITE FULL SPECS BEFORE CODING 95K STARS IN DAYS

@Av1dlive: 2 OpenAI engineers just gave a masterclass on how to build and ship apps using Codex they spent 16 mins on how Codex tu…

@sharbel: Someone built a free collection of production-grade engineering skills that teaches your AI coding agent to work exactl…

@SergioPaniego: https://x.com/SergioPaniego/status/2066498136273531363

Submit Feedback

Similar Articles

@SergioPaniego: we let an agent train a coding agent, live, from one prompt which agent is which, why it makes sense, and every artifac…

@RoundtableSpace: GITHUB JUST OPEN SOURCED A SYSTEM THAT FORCES AI AGENTS TO WRITE FULL SPECS BEFORE CODING 95K STARS IN DAYS
GitHub open-sourced a system that forces AI agents to write full specifications before coding, quickly garnering 95,000 stars.

@Av1dlive: 2 OpenAI engineers just gave a masterclass on how to build and ship apps using Codex they spent 16 mins on how Codex tu…

@sharbel: Someone built a free collection of production-grade engineering skills that teaches your AI coding agent to work exactl…

@SergioPaniego: https://x.com/SergioPaniego/status/2066498136273531363