training

#training

Agentic RL: Token-In, Token-Out Done Right (16 minute read)

TLDR AI ↗ · 2026-06-01 Cached

This article explains the 'Token-In, Token-Out' (TITO) invariant in reinforcement learning for LLMs, highlighting a common error when training multi-turn agents with tool calls. It presents two solutions: using per-model renderers or designing training to avoid re-encoding decoded tokens, emphasizing prefix-preserving chat templates.

0 favorites 0 likes

#training

@charles_irl: why use many bytes when few do trick?

X AI KOLs Following ↗ · 2026-05-30 Cached

Nan Jiang of Modal announces their work on open-source RL frameworks to support frontier open-weights models, highlighting delta compression and remaining challenges in weight sync and cross-cluster training.

0 favorites 0 likes

#training

@ivanfioravanti: One thing's for sure: on Nvidia everything's easier for local AI — inference, training, playing with existing projects.…

X AI KOLs Following ↗ · 2026-05-30 Cached

A developer reflects on the ease of using Nvidia for local AI tasks versus the satisfaction of getting things to work on Apple Silicon, promoting a 'hungry and foolish' mindset.

0 favorites 0 likes

#training

Bias Compounds, Variance Washes Out

Hacker News Top ↗ · 2026-05-29 Cached

This article demonstrates that using stochastic rounding for BF16 optimizer state can match FP32 performance because unbiased errors cancel over time, whereas round-to-nearest stalls due to compounding bias. An experiment with an MLP shows BF16+SR achieves similar loss to FP32 while using less memory.

0 favorites 0 likes

#training

Me train LLM on 8GB from Scratch. Me happy

Reddit r/LocalLLaMA ↗ · 2026-05-29

Built a repository to train a tiny language model (25M parameters) from scratch on 8GB VRAM, with support for MTP but noting limitations of mHC and BitNet.

0 favorites 0 likes

#training

For over a decade, we've accepted that end-to-end backprop is the only way to train deep networks (1 minute read)

TLDR AI ↗ · 2026-05-29 Cached

Sakana AI presents DiffusionBlocks, a method that trains neural networks block-wise by interpreting forward passes as diffusion denoising, significantly reducing memory requirements compared to traditional end-to-end backpropagation.

0 favorites 0 likes

#training

@FrancoisChauba1: If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your…

X AI KOLs Following ↗ · 2026-05-26 Cached

A critique arguing that training LLMs on human-generated data limits their ability to discover novel solutions via test-time compute, and that true AGI requires models that can explore hypothesis spaces more broadly, similar to AlphaZero.

0 favorites 0 likes

#training

@ShaokunZhang1: Want to train your own Claude Code/Codex agent with your own model? We are excited to roll out ProRL Agent V2: Polar. A…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

NVIDIA releases Polar, an open-source infrastructure for black-box agentic reinforcement learning, enabling training of coding agents like Claude Code or Codex with any agent harness or framework.

0 favorites 0 likes

#training

Everyone is selling AI agents, but almost nobody is selling the workflows to make them useful.

Reddit r/AI_Agents ↗ · 2026-05-26

The article argues that while many are building and selling AI agents, the real value lies in the workflows and training that make them useful, not the underlying technology.

0 favorites 0 likes

#training

Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper introduces Found in Conversation (FiC), a training framework using View-Asymmetric Self-Distillation to close the multi-turn performance gap in LLMs. The method teaches models to recover single-turn competence from underspecified multi-turn prompts, achieving 92-100% recovery across model families and sizes.

0 favorites 0 likes

#training

A lift for input-convex neural network training

arXiv cs.LG ↗ · 2026-05-26 Cached

Proposes a 'lift' method for training input-convex neural networks (ICNNs) that uses an unconstrained hypernetwork to emit non-negative inter-layer weights, softening the loss landscape and escaping gradient attenuation, achieving lower test loss than projected gradient descent and softplus reparametrization.

0 favorites 0 likes

#training

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

This paper systematically studies scale vectors in LLM normalization layers, showing they optimize training through a self-amplifying preconditioning effect, and proposes three lightweight improvements that enhance performance and scaling behavior with negligible overhead.

0 favorites 0 likes

#training

Cerebras Chip Sets Appear to be Optimized for LLMs Use

Reddit r/ArtificialInteligence ↗ · 2026-05-25

The article argues that Cerebras chips are optimized for LLM inference and training, not general AI workloads, and cautions against overhyping their ability to challenge NVIDIA across all AI domains.

0 favorites 0 likes

#training

Hyundai/Boston Dynamics is going to train Atlas the humanoid robot by watching football videos, and they'll document its progress in an online series called 'School of Football'

Reddit r/singularity ↗ · 2026-05-25

Boston Dynamics plans to train its Atlas humanoid robot using football videos, documenting progress in an online series called 'School of Football'.

0 favorites 0 likes

#training

@fiapp_pro: Officially announce, Codex GPT5.5 high is completely dead, probably because OpenAI is training GPT-5.6. Its performance on Codex is very lazy, hallucinates, loses context. Must enable xhigh to restore normal performance.

X AI KOLs Timeline ↗ · 2026-05-25 Cached

Users report that OpenAI's Codex GPT-5.5 high model performance has degraded, exhibiting laziness, hallucinations, and context loss. Suspecting it's due to OpenAI training GPT-5.6, they need to enable xhigh mode to restore normal performance.

0 favorites 0 likes

#training

@elonmusk: Grok foundation model V9-Medium (1.5T) has finished training. Evals look good. A lot of Cursor data was added in supple…

X AI KOLs Following ↗ · 2026-05-25 Cached

Elon Musk announced that the Grok foundation model V9-Medium (1.5T parameters) has finished training with strong evaluations, and will be publicly released in 2-3 weeks after fine-tuning and reinforcement learning.

0 favorites 0 likes

#training

@percyliang: Not only do we want to train a good model, we want to know it'll be good before we even start training. About a month a…

X AI KOLs Following ↗ · 2026-05-24 Cached

The Marin team pre-registered a predicted loss of 2.252 for a 129B parameter MoE model training run, and the actual result landed at 2.234, demonstrating accurate loss prediction before training.

0 favorites 0 likes

#training

@DanKornas: Most AI agents still split vision, language, and action across separate systems. Magma is a Microsoft Research foundati…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

Magma is an open-source repository from Microsoft Research for building multimodal AI agents that integrate vision, language, and action, providing model links, inference examples, training instructions, and demos.

0 favorites 0 likes

#training

@jino_rohit: before you start learning quantization for llms, you need to understand how different number formats are represented in…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

A thread explaining why understanding number formats in memory is crucial for learning LLM quantization, covering gradient NaN debugging, numerical stability, and quantization distortion.

0 favorites 0 likes

#training

[NEW] Supra-50M Released!

Reddit r/LocalLLaMA ↗ · 2026-05-22

SupraLabs released Supra-50M, a compact 50M-parameter causal language model with base and instruct versions, trained on 20B tokens from fineweb-edu, achieving competitive benchmarks against larger models like GPT-2 and SmolLM.

0 favorites 0 likes

training

Submit Feedback