Tag
The author documents their journey of building a 340M parameter LLM from scratch, trained exclusively on pre-1900 texts, including custom datasets, training scripts, and open-sourcing the model and code.
Clement Delangue asks whether an open source AI building model should be trained, noting available datasets and tools like HF, MLintern, transformers, and trl.
Natasha Jaques praises the Microsoft MAI-Thinking-1 paper for fully disclosing the training recipe for a frontier model, highlighting the token distribution across pre-training, mid-training, and RL post-training phases, and noting that Yann LeCun's cake analogy was prescient.
This paper studies context design for self-distillation in language models, finding that step-aligned critique feedback significantly outperforms binary reward or reference solution conditioning, because it targets only erroneous tokens while preserving correct behavior.
OpenEnv, a training environment, is being opened to the community with support from HuggingFace, Nvidia, Meta, and other leading companies.
OpenEnv, a framework for creating and deploying isolated execution environments for agentic RL training, has moved to Hugging Face and is now governed by a committee including Meta-PyTorch, NVIDIA, and others.
A developer built a 12M parameter LLM using a custom ML framework with a Rust backend and CUDA kernels, including Flash Attention and AdamW, and trained it from scratch.
Marin is an open-source framework from Stanford for reproducible foundation model research, covering data curation, tokenization, training, and evaluation; it was used to train an 8B parameter model that outperforms Llama 3.1 8B.
This tweet thread introduces research showing that training models to verify their own work can nearly double accuracy on hard math problems and improve scientific reasoning by 14x.
A blog post listing the 10 best large language models (LLMs) courses and training resources, including courses from Coursera, DataCamp, Udacity, and universities like Vanderbilt.
This paper introduces state commitment learning, a training objective that teaches language models to distinguish temporary computation tokens from persistent state tokens. The authors propose Counterfactual Erasure RL (CERL) and the Erasure Dependence Protocol, showing improvements across math, logic, science QA, and tool-use tasks without sacrificing accuracy.
CollabBench is a new benchmark for evaluating and training LLM agents in cooperative games, featuring diverse player simulation and a collaborative training paradigm. Experiments show 19.5% higher efficiency and 24.4% improved affective performance over base models.
TRL now supports fine-tuning models on agent traces from various sources like Claude Code, Codex, OpenClaw, and Pi, moving towards a standardized stack for training agentic models.
CMU Software Engineering Institute publishes an overview of ML training infrastructure, covering hardware considerations like GPU vs CPU and memory requirements.
Anthropic is hiring 1000 freelance software engineers to train Claude Code, with each task paying $280. The engineers will write prompts, compare code outputs, test model responses, and teach Claude how real developers work.
This article delves into the technical details such as asynchronous and sparse methods used in Cursor training Composer 2 model, and provides a comprehensive analysis of the RL infrastructure.
A hands-on PyTorch curriculum that teaches LLM training from transformer basics through fine-tuning and alignment, including RLHF and GRPO.
A user expresses excitement about working on reinforcement learning at Modal, referencing Modal's announcement of an open-source library and lessons learned for scaling RL training.
In May 2026, a tweet by CJ Zafir teaching ordinary people to fine-tune open source models gained widespread attention, illustrating the trend of training small models as the most underrated AI skill in 2026.
Introduces Eggroll, a low-rank evolution strategy for gradient-free training of spiking neural networks, reducing memory and time overhead while achieving competitive accuracy on N-MNIST.