@cjzafir: I pay Google $13.99 CAD to train a 9B LLM model on A100 80GB GPU. It takes: > 10 minutes to step notebook > 7 hours to …
Summary
A user shares a workflow for training a 9B LLM on an A100 GPU using Google Colab for $13.99 CAD, noting the overnight process and the ease of training small language models.
View Cached Full Text
Cached at: 05/22/26, 07:58 PM
I pay Google $13.99 CAD to train a 9B LLM model on A100 80GB GPU.
It takes:
10 minutes to step notebook 7 hours to train the model 1.5 hour for eval testing 1.25 hours for validation testing 30 minutes for GGUF/MLX conversion
Overnight, I run codex (with computer use chrome extension) on new notebook.
In the morning I get a new custom trained model.
It’s not hard to train SLMs anymore. Wake up.
Similar Articles
@akshay_pachaar: Google just dropped a new LLM! You can run it locally on just 8GB RAM. Let's fine-tune this on our own data (100% local…
Google dropped a new LLM that can run locally on just 8GB RAM. The tweet demonstrates fine-tuning it on personal data entirely locally.
@LottoLabs: A very cool model for the GPU poor bros Trained on an ungodly amount of tokens for a 8b a1b model Gonna be super fast e…
LottoLabs announces LiquidAI's LFM2.5-8B-A1B-GGUF model, an 8B parameter model trained on a massive token count and optimized for fast inference on limited GPU hardware, with support for llama.cpp, Ollama, vLLM, and more.
@heygurisingh: 𝑩𝒊𝒍𝒍𝒊𝒐𝒏-𝒑𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓 𝑳𝑳𝑴𝒔 𝒖𝒔𝒆𝒅 𝒕𝒐 𝒄𝒐𝒔𝒕 $10𝑴+ 𝒕𝒐 𝒕𝒓𝒂𝒊𝒏. Someone open sourced a repo t…
An open-source repository called train-llm-from-scratch enables training billion-parameter LLMs on a single GPU, with a configurable pipeline from raw text to inference, including dataset streaming and checkpointing, under MIT License.
Me train LLM on 8GB from Scratch. Me happy
Built a repository to train a tiny language model (25M parameters) from scratch on 8GB VRAM, with support for MTP but noting limitations of mHC and BitNet.
@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm
AirLLM is an open-source tool that optimizes inference memory usage, enabling 70B LLMs to run on a single 4GB GPU without quantization, and supports 405B models on 8GB VRAM.