Tag
This article explains the 'Token-In, Token-Out' (TITO) invariant in reinforcement learning for LLMs, highlighting a common error when training multi-turn agents with tool calls. It presents two solutions: using per-model renderers or designing training to avoid re-encoding decoded tokens, emphasizing prefix-preserving chat templates.
Nan Jiang of Modal announces their work on open-source RL frameworks to support frontier open-weights models, highlighting delta compression and remaining challenges in weight sync and cross-cluster training.
A developer reflects on the ease of using Nvidia for local AI tasks versus the satisfaction of getting things to work on Apple Silicon, promoting a 'hungry and foolish' mindset.
This article demonstrates that using stochastic rounding for BF16 optimizer state can match FP32 performance because unbiased errors cancel over time, whereas round-to-nearest stalls due to compounding bias. An experiment with an MLP shows BF16+SR achieves similar loss to FP32 while using less memory.
Built a repository to train a tiny language model (25M parameters) from scratch on 8GB VRAM, with support for MTP but noting limitations of mHC and BitNet.
Sakana AI presents DiffusionBlocks, a method that trains neural networks block-wise by interpreting forward passes as diffusion denoising, significantly reducing memory requirements compared to traditional end-to-end backpropagation.
A critique arguing that training LLMs on human-generated data limits their ability to discover novel solutions via test-time compute, and that true AGI requires models that can explore hypothesis spaces more broadly, similar to AlphaZero.
NVIDIA releases Polar, an open-source infrastructure for black-box agentic reinforcement learning, enabling training of coding agents like Claude Code or Codex with any agent harness or framework.
The article argues that while many are building and selling AI agents, the real value lies in the workflows and training that make them useful, not the underlying technology.
This paper introduces Found in Conversation (FiC), a training framework using View-Asymmetric Self-Distillation to close the multi-turn performance gap in LLMs. The method teaches models to recover single-turn competence from underspecified multi-turn prompts, achieving 92-100% recovery across model families and sizes.
Proposes a 'lift' method for training input-convex neural networks (ICNNs) that uses an unconstrained hypernetwork to emit non-negative inter-layer weights, softening the loss landscape and escaping gradient attenuation, achieving lower test loss than projected gradient descent and softplus reparametrization.
This paper systematically studies scale vectors in LLM normalization layers, showing they optimize training through a self-amplifying preconditioning effect, and proposes three lightweight improvements that enhance performance and scaling behavior with negligible overhead.
The article argues that Cerebras chips are optimized for LLM inference and training, not general AI workloads, and cautions against overhyping their ability to challenge NVIDIA across all AI domains.
Boston Dynamics plans to train its Atlas humanoid robot using football videos, documenting progress in an online series called 'School of Football'.
Users report that OpenAI's Codex GPT-5.5 high model performance has degraded, exhibiting laziness, hallucinations, and context loss. Suspecting it's due to OpenAI training GPT-5.6, they need to enable xhigh mode to restore normal performance.
Elon Musk announced that the Grok foundation model V9-Medium (1.5T parameters) has finished training with strong evaluations, and will be publicly released in 2-3 weeks after fine-tuning and reinforcement learning.
The Marin team pre-registered a predicted loss of 2.252 for a 129B parameter MoE model training run, and the actual result landed at 2.234, demonstrating accurate loss prediction before training.
Magma is an open-source repository from Microsoft Research for building multimodal AI agents that integrate vision, language, and action, providing model links, inference examples, training instructions, and demos.
A thread explaining why understanding number formats in memory is crucial for learning LLM quantization, covering gradient NaN debugging, numerical stability, and quantization distortion.
SupraLabs released Supra-50M, a compact 50M-parameter causal language model with base and instruct versions, trained on 20B tokens from fineweb-edu, achieving competitive benchmarks against larger models like GPT-2 and SmolLM.