training

#training

@che_shr_cat: 1/ We have been treating GPU memory all wrong. What if the GPU didn't need to store your model at all? MegaTrain enable…

X AI KOLs Timeline ↗ · 18h ago Cached

MegaTrain enables full-precision training of 100B+ LLMs on a single GPU by treating VRAM as a transient stateless cache, inverting the memory hierarchy.

0 favorites 0 likes

#training

@Hikari_07_jp: Progress report! Training of the DFlash backbone and markov head is complete, enabling DSpark to be used on 27B. We wil…

X AI KOLs Timeline ↗ · 19h ago Cached

Progress update on DSpark: training of DFlash backbone and markov head is complete, enabling use on 27B. Next is training the confidence head for adaptive drafting, expected 8-14% speed improvement over DFlash.

0 favorites 0 likes

#training

@VukRosic99: When a small model learns from a big one, half the lesson is wasted The setup: a small "student" model writes an answer…

X AI KOLs Timeline ↗ · yesterday Cached

The paper identifies position bias in on-policy distillation for language models, where later tokens in student-generated answers receive degraded supervision. The proposed Importance-Weighted On-Policy Distillation (IW-OPD) weights corrections based on accumulated drift, improving learning speed and final performance.

0 favorites 0 likes

#training

@ArkadiiBessonov: Three main ways to do FP8 in LLM pretraining — and they differ in mainly one thing: how the scale is attached. per-tens…

X AI KOLs Timeline ↗ · 2d ago Cached

Explains three main approaches to FP8 scaling in LLM pretraining—per-tensor, blockwise, and MXFP8—focusing on how the scale is attached, and derives tile geometries from the constraint that scale must remain constant along the matmul's contracted dimension.

0 favorites 0 likes

#training

South Korea plans to train entire military as "drone warriors"

Ars Technica ↗ · 2d ago Cached

South Korea announces plans to train all 500,000 military personnel to operate drones as a 'universal combat tool,' inspired by drone warfare in Ukraine and the Middle East.

0 favorites 0 likes

#training

A debugger for RL reward functions that detects reward hacking during training [P]

Reddit r/MachineLearning ↗ · 3d ago

A debugger that detects reward hacking in reinforcement learning reward functions during training, aiding developers in identifying and fixing issues.

0 favorites 0 likes

#training

@almond_robotics: 1 of ~1000 training episodes.

X AI KOLs Following ↗ · 3d ago Cached

Almond Robotics shares one of approximately 1000 training episodes for their robotic system.

0 favorites 0 likes

#training

@lilianweng: A super long overdue (3+ years?) post on scaling laws. Compute is expensive. Scaling laws are a way to help us reason a…

X AI KOLs Timeline ↗ · 3d ago Cached

Lilian Weng's blog post provides a comprehensive overview of scaling laws in deep learning, covering their derivation, compute-optimal allocation, and the debate between Kaplan et al. and Chinchilla.

0 favorites 0 likes

#training

@SergioPaniego: you can now train @liquidai's LFM2-VL in TRL GRPO and RLOO included, with an example script

X AI KOLs Following ↗ · 4d ago Cached

You can now train Liquid AI's LFM2-VL model using TRL's GRPO and RLOO methods, with an example script provided.

0 favorites 0 likes

#training

@natolambert: Another quick lecture -- I've been asked many times for prereq's to my book and what you should know, so built a little…

X AI KOLs Timeline ↗ · 5d ago Cached

Nathan Lambert shares a video lecture covering prerequisites for his book, including language model basics, probabilities, and training pipelines, using GLM 5.2.

0 favorites 0 likes

#training

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

arXiv cs.AI ↗ · 5d ago Cached

This paper from OpenAI investigates whether reinforcement learning on beneficial behavior can produce broad and persistent alignment generalization beyond the training distribution. Using a dataset of realistic situations, they show that RL training on beneficial traits improves out-of-distribution alignment and persistence against adversarial attacks.

0 favorites 0 likes

#training

@SergioPaniego: we let an agent train a coding agent, live, from one prompt which agent is which, why it makes sense, and every artifac…

X AI KOLs Timeline ↗ · 6d ago Cached

A live demonstration of an AI agent training a coding agent from a single prompt, with all artifacts recapped.

0 favorites 0 likes

#training

@Muennighoff: we're working on a much better composer model by scaling to Opus/GPT-size, from-scratch training & going beyond coding!

X AI KOLs Timeline ↗ · 6d ago Cached

Muennighoff announces work on a much better composer model, scaling to Opus/GPT-size, training from scratch, and going beyond coding, as part of Cursor's collaboration with SpaceX.

0 favorites 0 likes

#training

GLM5.2 @7tg on 4x3090 + 192GB on budget motherboard + cpu

Reddit r/LocalLLaMA ↗ · 2026-06-22

Running GLM5.2 with 7 trillion tokens on a budget setup using 4x RTX 3090 GPUs and 192GB RAM.

0 favorites 0 likes

#training

Tmax: A simple recipe for terminal agents

Hugging Face Daily Papers ↗ · 2026-06-22 Cached

Tmax introduces a simplified RL training recipe for terminal agents, achieving state-of-the-art performance with a 9B parameter model using a novel data generation taxonomy and an expanded open-source dataset.

0 favorites 0 likes

#training

Data-centric debugging for teams training neural nets [P]

Reddit r/MachineLearning ↗ · 2026-06-21

WeightsLab is an open-source, PyTorch-native tool that allows teams to pause training, inspect live loss signals, and catch data issues like mislabels and class imbalance before they affect model performance. It is designed for computer vision engineers working with images, videos, and LiDAR point clouds.

0 favorites 0 likes

#training

Why can't LLMs be trained to think in an optimized AI language rather than English?

Reddit r/singularity ↗ · 2026-06-21

A speculative discussion questioning why LLMs are not trained to think in an optimized internal language rather than natural language, and whether that could improve efficiency.

0 favorites 0 likes

#training

@FinanceYF5: 3/ He believes the AI capability leap in the past 5 months comes not only from tool advancements like Claude Code, but because of 【Mythos】—a new Anthropic model that quietly changed the entire R&D rhythm after its training completed in February this year. Key takeaway: Leading models are helping to train the next generation of leading models...

X AI KOLs Following ↗ · 2026-06-21 Cached

According to speculation, Anthropic's new model Mythos, after completing training in February this year, quietly changed the R&D rhythm, leading to a significant leap in AI capabilities over the past 5 months. Leading models are helping to train the next generation of models.

0 favorites 0 likes

#training

@TheTuringPost: An open-source Agent Reinforcement Trainer (ART) – plugs GRPO into any Python app → Your app defines the task and rewar…

X AI KOLs Timeline ↗ · 2026-06-20 Cached

The Agent Reinforcement Trainer (ART) is an open-source framework that plugs GRPO-based RL into any Python app, enabling agents to learn from environment interaction via trajectory scoring and LoRA updates, with claims of outperforming OpenAI's o3 on email retrieval using a Qwen 2.5 14B model.

0 favorites 0 likes

#training

RTX 5090 MSI, only inference or training at 475-500W. Make sure to not bend you cable!

Reddit r/LocalLLaMA ↗ · 2026-06-20

MSI's RTX 5090 GPU operates at 475-500W for inference or training, with a warning about cable bending.

0 favorites 0 likes

training

Submit Feedback