training

#training

Making a vintage LLM from scratch

Hacker News Top ↗ · 2026-06-11 Cached

The author documents their journey of building a 340M parameter LLM from scratch, trained exclusively on pre-1900 texts, including custom datasets, training scripts, and open-sourcing the model and code.

0 favorites 0 likes

#training

@ClementDelangue: Should we try to train an open source AI building model? We obviously have interesting datasets with HF, MLintern, tran…

X AI KOLs Following ↗ · 2026-06-10

Clement Delangue asks whether an open source AI building model should be trained, noting available datasets and tools like HF, MLintern, transformers, and trl.

0 favorites 0 likes

#training

@natashajaques: Really enjoyed reading the Microsoft MAI-Thinking-1 "Building a Hill Climbing Machine" paper. Amazing they publicly rel…

X AI KOLs Following ↗ · 2026-06-10 Cached

Natasha Jaques praises the Microsoft MAI-Thinking-1 paper for fully disclosing the training recipe for a frontier model, highlighting the token distribution across pre-training, mid-training, and RL post-training phases, and noting that Yann LeCun's cake analogy was prescient.

0 favorites 0 likes

#training

The Role of Feedback Alignment in Self-Distillation

Hugging Face Daily Papers ↗ · 2026-06-09 Cached

This paper studies context design for self-distillation in language models, finding that step-aligned critique feedback significantly outperforms binary reward or reference solution conditioning, because it targets only erroneous tokens while preserving correct behavior.

0 favorites 0 likes

#training

@qjoyliu: The future of training is open source. Super excited to announce that we've joined forces with HuggingFace, Nvidia, Met…

X AI KOLs Following ↗ · 2026-06-08 Cached

OpenEnv, a training environment, is being opened to the community with support from HuggingFace, Nvidia, Meta, and other leading companies.

0 favorites 0 likes

#training

@SergioPaniego: OpenEnv has a new home: http://github.com/huggingface/OpenEnv… starting today, it's coordinated by a committee that inc…

X AI KOLs Following ↗ · 2026-06-08 Cached

OpenEnv, a framework for creating and deploying isolated execution environments for agentic RL training, has moved to Hugging Face and is now governed by a committee including Meta-PyTorch, NVIDIA, and others.

1 favorites 1 likes

#training

@charles_irl: Somehow missed this one in the hustle and bustle. Very cool demo!

X AI KOLs Following ↗ · 2026-06-07 Cached

A developer built a 12M parameter LLM using a custom ML framework with a Rust backend and CUDA kernels, including Flash Attention and AdamW, and trained it from scratch.

0 favorites 0 likes

#training

@eliebakouch: one of my favorite projects is Marin from the stanford folks, they have a scientific approach to training, are ready to…

X AI KOLs Following ↗ · 2026-06-07 Cached

Marin is an open-source framework from Stanford for reproducible foundation model research, covering data curation, tokenization, training, and evaluation; it was used to train an 8B parameter model that outperforms Llama 3.1 8B.

0 favorites 0 likes

#training

@ChenHenryWu: Self-improvement depends on whether a model can judge its own work. We usually train models to generate better - why no…

X AI KOLs Timeline ↗ · 2026-06-05 Cached

This tweet thread introduces research showing that training models to verify their own work can nearly double accuracy on hard math problems and improve scientific reasoning by 14x.

0 favorites 0 likes

#training

@tut_ml: Best LLM Courses- https://mltut.com/best-large-language-models-courses/…

X AI KOLs Timeline ↗ · 2026-06-05 Cached

A blog post listing the 10 best large language models (LLMs) courses and training resources, including courses from Coursera, DataCamp, Udacity, and universities like Vanderbilt.

0 favorites 0 likes

#training

State commitment learning: training language models to distinguish computation from memory

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper introduces state commitment learning, a training objective that teaches language models to distinguish temporary computation tokens from persistent state tokens. The authors propose Counterfactual Erasure RL (CERL) and the Erasure Dependence Protocol, showing improvements across math, logic, science QA, and tool-use tasks without sacrificing accuracy.

0 favorites 0 likes

#training

CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

arXiv cs.CL ↗ · 2026-06-05 Cached

CollabBench is a new benchmark for evaluating and training LLM agents in cooperative games, featuring diverse player simulation and a collaborative training paradigm. Experiments show 19.5% higher efficiency and 24.4% improved affective performance over base models.

0 favorites 0 likes

#training

@adithya_s_k: You can now finetune models on agent traces directly with TRL Claude Code traces Codex traces OpenClaw traces Pi traces…

X AI KOLs Following ↗ · 2026-06-04 Cached

TRL now supports fine-tuning models on agent traces from various sources like Claude Code, Codex, OpenClaw, and Pi, moving towards a standardized stack for training agentic models.

0 favorites 0 likes

#training

@loganthorneloe: Read this to get started learning ML infra. This is an excellent high-level overview of important considerations in ML …

X AI KOLs Timeline ↗ · 2026-06-03 Cached

CMU Software Engineering Institute publishes an overview of ML training infrastructure, covering hardware considerations like GPU vs CPU and memory requirements.

0 favorites 0 likes

#training

@FinanceYF5: Anthropic is hiring 1000 freelance software engineers to train Claude Code. Each task pays $280. They write prompts, compare code outputs, test the model's follow-up responses, and teach Claude how real developers work. It's like handing...

X AI KOLs Following ↗ · 2026-06-03 Cached

Anthropic is hiring 1000 freelance software engineers to train Claude Code, with each task paying $280. The engineers will write prompts, compare code outputs, test model responses, and teach Claude how real developers work.

0 favorites 0 likes

#training

@FeitengLi: Asynchronous, Sparse, and the Fifth Decimal Place: Engineering Details of Cursor Training Composer 2 https://lattifai.com/zh/podcasts/SequoiaCapital/UDTr9yUnLUI…

X AI KOLs Timeline ↗ · 2026-06-03 Cached

This article delves into the technical details such as asynchronous and sparse methods used in Cursor training Composer 2 model, and provides a comprehensive analysis of the RL infrastructure.

0 favorites 0 likes

#training

@DanKornas: Stop learning LLMs from disconnected tutorials. LLM from Scratch is a hands-on PyTorch curriculum for builders who want…

X AI KOLs Timeline ↗ · 2026-06-02 Cached

A hands-on PyTorch curriculum that teaches LLM training from transformer basics through fine-tuning and alignment, including RLHF and GRPO.

0 favorites 0 likes

#training

@_djdumpling: very exciting work and thrilled to be working on RL this summer at @modal!

X AI KOLs Timeline ↗ · 2026-06-01 Cached

A user expresses excitement about working on reinforcement learning at Modal, referencing Modal's announcement of an open-source library and lessons learned for scaling RL training.

0 favorites 0 likes

#training

@yibie: Training Small Models: The Most Underrated AI Skill in 2026 On May 11, 2026, a person named CJ Zafir posted a tweet. He wanted to teach ordinary people to fine-tune open source models. 2,538 likes, 316 retweets, 178,000 views. This tweet blew up…

X AI KOLs Timeline ↗ · 2026-06-01 Cached

In May 2026, a tweet by CJ Zafir teaching ordinary people to fine-tune open source models gained widespread attention, illustrating the trend of training small models as the most underrated AI skill in 2026.

0 favorites 0 likes

#training

Gradient-Free Training of Spiking Neural Networks via Low-Rank Evolution Strategies

arXiv cs.AI ↗ · 2026-06-01 Cached

Introduces Eggroll, a low-rank evolution strategy for gradient-free training of spiking neural networks, reducing memory and time overhead while achieving competitive accuracy on N-MNIST.

0 favorites 0 likes

training

Submit Feedback