Tag
MegaTrain enables full-precision training of 100B+ LLMs on a single GPU by treating VRAM as a transient stateless cache, inverting the memory hierarchy.
Progress update on DSpark: training of DFlash backbone and markov head is complete, enabling use on 27B. Next is training the confidence head for adaptive drafting, expected 8-14% speed improvement over DFlash.
The paper identifies position bias in on-policy distillation for language models, where later tokens in student-generated answers receive degraded supervision. The proposed Importance-Weighted On-Policy Distillation (IW-OPD) weights corrections based on accumulated drift, improving learning speed and final performance.
Explains three main approaches to FP8 scaling in LLM pretraining—per-tensor, blockwise, and MXFP8—focusing on how the scale is attached, and derives tile geometries from the constraint that scale must remain constant along the matmul's contracted dimension.
South Korea announces plans to train all 500,000 military personnel to operate drones as a 'universal combat tool,' inspired by drone warfare in Ukraine and the Middle East.
A debugger that detects reward hacking in reinforcement learning reward functions during training, aiding developers in identifying and fixing issues.
Almond Robotics shares one of approximately 1000 training episodes for their robotic system.
Lilian Weng's blog post provides a comprehensive overview of scaling laws in deep learning, covering their derivation, compute-optimal allocation, and the debate between Kaplan et al. and Chinchilla.
You can now train Liquid AI's LFM2-VL model using TRL's GRPO and RLOO methods, with an example script provided.
Nathan Lambert shares a video lecture covering prerequisites for his book, including language model basics, probabilities, and training pipelines, using GLM 5.2.
This paper from OpenAI investigates whether reinforcement learning on beneficial behavior can produce broad and persistent alignment generalization beyond the training distribution. Using a dataset of realistic situations, they show that RL training on beneficial traits improves out-of-distribution alignment and persistence against adversarial attacks.
A live demonstration of an AI agent training a coding agent from a single prompt, with all artifacts recapped.
Muennighoff announces work on a much better composer model, scaling to Opus/GPT-size, training from scratch, and going beyond coding, as part of Cursor's collaboration with SpaceX.
Running GLM5.2 with 7 trillion tokens on a budget setup using 4x RTX 3090 GPUs and 192GB RAM.
Tmax introduces a simplified RL training recipe for terminal agents, achieving state-of-the-art performance with a 9B parameter model using a novel data generation taxonomy and an expanded open-source dataset.
WeightsLab is an open-source, PyTorch-native tool that allows teams to pause training, inspect live loss signals, and catch data issues like mislabels and class imbalance before they affect model performance. It is designed for computer vision engineers working with images, videos, and LiDAR point clouds.
A speculative discussion questioning why LLMs are not trained to think in an optimized internal language rather than natural language, and whether that could improve efficiency.
According to speculation, Anthropic's new model Mythos, after completing training in February this year, quietly changed the R&D rhythm, leading to a significant leap in AI capabilities over the past 5 months. Leading models are helping to train the next generation of models.
The Agent Reinforcement Trainer (ART) is an open-source framework that plugs GRPO-based RL into any Python app, enabling agents to learn from environment interaction via trajectory scoring and LoRA updates, with claims of outperforming OpenAI's o3 on email retrieval using a Qwen 2.5 14B model.
MSI's RTX 5090 GPU operates at 475-500W for inference or training, with a warning about cable bending.