@SergioPaniego: https://x.com/SergioPaniego/status/2066498136273531363
Summary
This post demonstrates how to fine-tune a model for free using a single prompt, leveraging the new Google Colab CLI along with Hugging Face's TRL and trackio tools, all orchestrated by an AI agent.
View Cached Full Text
Cached at: 06/15/26, 01:03 PM
How to fine-tune a model for free from one prompt, with TRL and the Google Colab CLI
I opened a coding agent, wrote one prompt, and walked away. A couple of minutes later I had a fine-tuned model, trained on a free cloud GPU, with its metrics on a live trackio dashboard and its weights waiting for me on the Hub. I didn’t touch a GPU, and I didn’t write a line of the training code.
Last week, Google released the Colab CLI: full Colab runtimes you can drive from your terminal. It is a much-needed piece for the era of agents, and since I’ve always been a fan of Colab (it has helped me and so many others throughout my career), I had to test it. I first saw it through @osanseviero and @_philschmid.
The idea is simple: you tell your agent “fine-tune a model on this dataset” and it handles the rest, fully automatic. Google Colab provides the GPU, and the rest is the Hugging Face stack: transformers and datasets load the model and data, TRL fine-tunes it, trackio tracks the run, and the Hub hosts the dataset and the result. The agent just wires it together.
Here is the whole run, start to finish:
The prompt
This is all I typed, and everything after it was the agent. To run it yourself, do the quick one-time setup in Try it yourself (below) first, then paste this in:
You’re in the TRL repo. Read the SFT examples in examples/scripts/ to learn the project’s conventions, then adapt them into a small, self-contained training script for this task: fine-tune Qwen/Qwen2.5-0.5B-Instruct with QLoRA on philschmid/gretel-synthetic-text-to-sql (format schema + question -> SQL as chat messages). Run it on a remote Colab T4 via the Google Colab CLI: provision the GPU, install deps, log in to Hugging Face on the runtime, run a short demo run, stream metrics to a trackio Space, push the trained adapter to the Hub, and tear the session down. Report the final loss and the model URL.
My favorite part is how little it takes to retarget. I change one part of the prompt, the model or the dataset, and the same recipe trains something completely different. I treat it as a template, not a one-off.
What just happened
I gave the agent a single prompt. From there, it did everything:
-
It read the SFT examples in the TRL repo to learn the conventions, then wrote its own training script for my task.
-
It provisioned a remote GPU through the brand-new Google Colab CLI.
-
It installed the dependencies, authenticated with Hugging Face, and launched QLoRA training with TRL.
-
It streamed live metrics to a trackio Space on the Hub.
-
It pushed the trained adapter to my Hugging Face account.
-
It tore the session down when it finished.
Nothing on my machine. No babysitting. No GPU.
The part I keep coming back to: I never handed it a script, and I never explained the Colab CLI. It learned the training conventions from TRL’s examples and the commands from the CLI’s built-in agent skill, then wrote and ran everything itself.
It cost me nothing
The whole run was on a free Colab T4. Qwen2.5-0.5B-Instruct is tiny, so a short QLoRA run finishes in a couple of minutes. And the setup is almost nothing, because Colab already ships PyTorch, transformers, datasets and the rest preinstalled. The agent only had to add the few missing pieces (TRL, trackio, and the 4-bit quantization library). What it wrote was a standard SFTTrainer setup with LoRA, nothing exotic.
I watched it train live
Because the agent wired up trackio, my run streamed live to a Hugging Face Space. I could open it in any browser and watch the loss curve update in real time while the GPU did the work somewhere else. When training finished, the Space stayed up as a record of the run.
The loss dropped steadily over the run. You can see the curve in the live Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio
And I actually kept the model: the agent pushed the trained adapter to my Hub account, so it is sitting there ready to use: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora
It debugged itself
Partway through, the run hit a hardware quirk on the free T4 (the GPU does not support a precision mode the script first tried). It read the error, fixed the setting, and re-ran on its own, with no input from me. I was expecting it to handle edges like this, and it did. The model still came out the other end.
Try it yourself
A one-time setup, then you can run the prompt above:
-
Install the Colab CLI: uv tool install google-colab-cli (its agent skill ships with it).
-
Run any colab command once to authorize Colab (it opens a Google sign-in in your browser).
-
Log in to Hugging Face with a write token: hf auth login (so the run can stream to a trackio Space and push the model).
-
Open your coding agent in a checkout of the TRL repo, paste the prompt, and watch it go (swap in any model or dataset you like).
Resources
-
Google Colab CLI: https://github.com/googlecolab/google-colab-cli
-
TRL: https://github.com/huggingface/trl
-
trackio: https://github.com/gradio-app/trackio
-
The fine-tuned model: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora
-
The live metrics Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio
-
Related: fine-tuning with agents on Hugging Face Jobs: https://huggingface.co/blog/hf-skills-training and https://huggingface.co/blog/hf-skills-training-codex
Similar Articles
@SergioPaniego: we let an agent train a coding agent, live, from one prompt which agent is which, why it makes sense, and every artifac…
A live demonstration of an AI agent training a coding agent from a single prompt, with all artifacts recapped.
@victormustar: https://x.com/victormustar/status/2059264598407033062
This post describes how to use Hugging Face's ZeroGPU and a coding agent to autonomously deploy AI models, specifically the LongCat talking-avatar model, on a budget.
@AnandButani: ml-intern by @huggingface is wild You drop a high-level prompt (“build the best scientific reasoning model” or “crush h…
Hugging Face’s open-source "ml-intern" agent automates the full post-training pipeline—from literature review and data cleaning to model tuning—given only a high-level prompt.
@heyshrutimishra: This 1 hour tutorial from Stanford University will teach you AI agents, Prompts & RAG for FREE
Stanford University offers a free 1-hour tutorial covering AI agents, prompts, and RAG.
A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.
A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.