@SergioPaniego: https://x.com/SergioPaniego/status/2066498136273531363

X AI KOLs Timeline 06/15/26, 12:28 PM Tools

fine-tuning colab-cli trl qlora huggingface agent-workflow free-gpu

Summary

This post demonstrates how to fine-tune a model for free using a single prompt, leveraging the new Google Colab CLI along with Hugging Face's TRL and trackio tools, all orchestrated by an AI agent.

https://t.co/A5t1nC5Fkw

Original Article

View Cached Full Text

Cached at: 06/15/26, 01:03 PM

How to fine-tune a model for free from one prompt, with TRL and the Google Colab CLI

I opened a coding agent, wrote one prompt, and walked away. A couple of minutes later I had a fine-tuned model, trained on a free cloud GPU, with its metrics on a live trackio dashboard and its weights waiting for me on the Hub. I didn’t touch a GPU, and I didn’t write a line of the training code.

Last week, Google released the Colab CLI: full Colab runtimes you can drive from your terminal. It is a much-needed piece for the era of agents, and since I’ve always been a fan of Colab (it has helped me and so many others throughout my career), I had to test it. I first saw it through @osanseviero and @_philschmid.

The idea is simple: you tell your agent “fine-tune a model on this dataset” and it handles the rest, fully automatic. Google Colab provides the GPU, and the rest is the Hugging Face stack: transformers and datasets load the model and data, TRL fine-tunes it, trackio tracks the run, and the Hub hosts the dataset and the result. The agent just wires it together.

Here is the whole run, start to finish:

The prompt

This is all I typed, and everything after it was the agent. To run it yourself, do the quick one-time setup in Try it yourself (below) first, then paste this in:

You’re in the TRL repo. Read the SFT examples in examples/scripts/ to learn the project’s conventions, then adapt them into a small, self-contained training script for this task: fine-tune Qwen/Qwen2.5-0.5B-Instruct with QLoRA on philschmid/gretel-synthetic-text-to-sql (format schema + question -> SQL as chat messages). Run it on a remote Colab T4 via the Google Colab CLI: provision the GPU, install deps, log in to Hugging Face on the runtime, run a short demo run, stream metrics to a trackio Space, push the trained adapter to the Hub, and tear the session down. Report the final loss and the model URL.

My favorite part is how little it takes to retarget. I change one part of the prompt, the model or the dataset, and the same recipe trains something completely different. I treat it as a template, not a one-off.

What just happened

I gave the agent a single prompt. From there, it did everything:

It read the SFT examples in the TRL repo to learn the conventions, then wrote its own training script for my task.
It provisioned a remote GPU through the brand-new Google Colab CLI.
It installed the dependencies, authenticated with Hugging Face, and launched QLoRA training with TRL.
It streamed live metrics to a trackio Space on the Hub.
It pushed the trained adapter to my Hugging Face account.
It tore the session down when it finished.

Nothing on my machine. No babysitting. No GPU.

The part I keep coming back to: I never handed it a script, and I never explained the Colab CLI. It learned the training conventions from TRL’s examples and the commands from the CLI’s built-in agent skill, then wrote and ran everything itself.

It cost me nothing

The whole run was on a free Colab T4. Qwen2.5-0.5B-Instruct is tiny, so a short QLoRA run finishes in a couple of minutes. And the setup is almost nothing, because Colab already ships PyTorch, transformers, datasets and the rest preinstalled. The agent only had to add the few missing pieces (TRL, trackio, and the 4-bit quantization library). What it wrote was a standard SFTTrainer setup with LoRA, nothing exotic.

I watched it train live

Because the agent wired up trackio, my run streamed live to a Hugging Face Space. I could open it in any browser and watch the loss curve update in real time while the GPU did the work somewhere else. When training finished, the Space stayed up as a record of the run.

The loss dropped steadily over the run. You can see the curve in the live Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio

And I actually kept the model: the agent pushed the trained adapter to my Hub account, so it is sitting there ready to use: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora

It debugged itself

Partway through, the run hit a hardware quirk on the free T4 (the GPU does not support a precision mode the script first tried). It read the error, fixed the setting, and re-ran on its own, with no input from me. I was expecting it to handle edges like this, and it did. The model still came out the other end.

Try it yourself

A one-time setup, then you can run the prompt above:

Install the Colab CLI: uv tool install google-colab-cli (its agent skill ships with it).
Run any colab command once to authorize Colab (it opens a Google sign-in in your browser).
Log in to Hugging Face with a write token: hf auth login (so the run can stream to a trackio Space and push the model).
Open your coding agent in a checkout of the TRL repo, paste the prompt, and watch it go (swap in any model or dataset you like).

Resources

Google Colab CLI: https://github.com/googlecolab/google-colab-cli
TRL: https://github.com/huggingface/trl
trackio: https://github.com/gradio-app/trackio
The fine-tuned model: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora
The live metrics Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio
Related: fine-tuning with agents on Hugging Face Jobs: https://huggingface.co/blog/hf-skills-training and https://huggingface.co/blog/hf-skills-training-codex

@SergioPaniego: https://x.com/SergioPaniego/status/2066498136273531363

How to fine-tune a model for free from one prompt, with TRL and the Google Colab CLI

The prompt

What just happened

It cost me nothing

I watched it train live

It debugged itself

Try it yourself

Resources

Similar Articles

@SergioPaniego: we let an agent train a coding agent, live, from one prompt which agent is which, why it makes sense, and every artifac…

@victormustar: https://x.com/victormustar/status/2059264598407033062

@AnandButani: ml-intern by @huggingface is wild You drop a high-level prompt (“build the best scientific reasoning model” or “crush h…

@heyshrutimishra: This 1 hour tutorial from Stanford University will teach you AI agents, Prompts & RAG for FREE

A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.

Submit Feedback

Similar Articles

@SergioPaniego: we let an agent train a coding agent, live, from one prompt which agent is which, why it makes sense, and every artifac…

@victormustar: https://x.com/victormustar/status/2059264598407033062

@AnandButani: ml-intern by @huggingface is wild You drop a high-level prompt (“build the best scientific reasoning model” or “crush h…

@heyshrutimishra: This 1 hour tutorial from Stanford University will teach you AI agents, Prompts & RAG for FREE

A recap of a live stream where an AI agent (Codex) autonomously runs the entire SFT workflow to train a small Gemma 2B model to imitate a coding agent (pi). All artifacts and code are open-sourced.