llm-training

#llm-training

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

The MiniMax-M2 series introduces Mixture-of-Experts language models that achieve high performance on agentic tasks with minimal activated parameters (9.8B per token out of 229.9B total), leveraging agent-driven data pipelines, a scalable RL system called Forge, and a checkpoint that takes early steps toward self-evolution.

0 favorites 0 likes

#llm-training

Norway's 2 petabytes of Huawei flash storage and LLM training

Hacker News Top ↗ · 2026-05-25 Cached

Norway's National Library is building a sovereign Norwegian LLM using 2 PB of Huawei OceanStor Dorado flash storage for its AI training data pipeline, addressing the need for a local language model.

0 favorites 0 likes

#llm-training

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

arXiv cs.CL ↗ · 2026-05-25 Cached

ARES proposes a framework for automatically constructing rubric-based RL data from pretraining documents, generating question-answer pairs and weighted rubrics to enable instance-level reward supervision for open-ended LLM responses, outperforming existing methods on multi-dimensional open-ended tasks.

0 favorites 0 likes

#llm-training

@nicos_ai: NVIDIA has just officially published the Skills they use for their AI agents. Right now they have Skills for: → analyzi…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

NVIDIA has officially published a set of Skills for AI agents, covering video analysis, voice agents, LLM training, model acceleration, RAG, secure environments, logistics optimization, and CUDA programming.

1 favorites 1 likes

#llm-training

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Reddit r/LocalLLaMA ↗ · 2026-05-22 Cached

This paper introduces Vector Policy Optimization (VPO), a reinforcement learning algorithm that trains LLMs to produce diverse solutions by optimizing across multiple reward dimensions, significantly improving test-time search performance compared to scalar RL baselines.

0 favorites 0 likes

#llm-training

@cjzafir: I pay Google $13.99 CAD to train a 9B LLM model on A100 80GB GPU. It takes: > 10 minutes to step notebook > 7 hours to …

X AI KOLs Timeline ↗ · 2026-05-22 Cached

A user shares a workflow for training a 9B LLM on an A100 GPU using Google Colab for $13.99 CAD, noting the overnight process and the ease of training small language models.

0 favorites 0 likes

#llm-training

@maximelabonne: Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate (first screenshot, Kalra and Ba…

X AI KOLs Following ↗ · 2026-05-22 Cached

This paper introduces a framework to quantify hyperparameter transfer in LLMs and finds that the benefit of μP over SP in AdamW training largely comes from increasing the embedding layer learning rate. It also explores the impact of weight decay and other factors.

0 favorites 0 likes

#llm-training

@maximelabonne: To clarify, this paper basically says: under AdamW, µP's embedding LR rule (constant) is essentially right and explains…

X AI KOLs Following ↗ · 2026-05-22 Cached

This paper clarifies that under AdamW, µP's embedding learning rate rule (constant) is essentially correct and explains most of µP's benefit, contrary to a previous finding by Hayou et al. about realistic LLM vocab sizes.

0 favorites 0 likes

#llm-training

PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models

arXiv cs.AI ↗ · 2026-05-22 Cached

PlanningBench is a framework for generating scalable, diverse, and verifiable planning data to evaluate and train large language models, featuring a constraint-driven synthesis pipeline with adaptive difficulty control and quality filtering. Experiments show that frontier LLMs struggle with coupled constraints, and reinforcement learning on PlanningBench data improves performance on unseen planning tasks.

0 favorites 0 likes

#llm-training

@HanGuo97: LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes…

X AI KOLs Following ↗ · 2026-05-21 Cached

CODA reparameterizes memory-bound operations in LLM training to fuse them into the matmul epilogue, achieving near state-of-the-art performance with LLM-generated kernels.

0 favorites 0 likes

#llm-training

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

Reddit r/MachineLearning ↗ · 2026-05-21

RPS is a two-stage LLM post-training method inspired by neuroscience, combining curriculum learning with learning rate decay. Preliminary results show improved program synthesis reliability on Qwen3-8b compared to equal learning rate training.

0 favorites 0 likes

#llm-training

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

arXiv cs.LG ↗ · 2026-05-21 Cached

Proposes Introspective Training (IXT), a unified feedback-conditioning algorithm that uses a thinking reward model to annotate data with natural language critiques, enabling quality-aware training across all LLM stages. The method improves compute efficiency by up to 2.8x and achieves better performance in math and code domains.

0 favorites 0 likes

#llm-training

Aggressive AI scrapers are making it kinda suck to run wikis

Lobsters Hottest ↗ · 2026-05-21 Cached

Discusses how aggressive AI scrapers are disrupting wiki operations by imitating human traffic and using residential proxies, drastically increasing server costs and causing service instability.

0 favorites 0 likes

#llm-training

@sitinme: Saw Karpathy open-sourced a very interesting project autoresearch, which gives a real but small-scale LLM training task to an AI Agent, letting it do research, modify code, run experiments, look at results, and then decide whether to keep or discard the changes. The project is based on a single NVIDIA…

X AI KOLs Timeline ↗ · 2026-05-21 Cached

Karpathy open-sourced an experimental project, autoresearch, that lets an AI Agent automatically complete the research loop for small-scale LLM training: modify code, run experiments, evaluate results, and iterate. Humans only need to write the research plan and constraints.

0 favorites 0 likes

#llm-training

@heygurisingh: 𝑩𝒊𝒍𝒍𝒊𝒐𝒏-𝒑𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓 𝑳𝑳𝑴𝒔 𝒖𝒔𝒆𝒅 𝒕𝒐 𝒄𝒐𝒔𝒕 $10𝑴+ 𝒕𝒐 𝒕𝒓𝒂𝒊𝒏. Someone open sourced a repo t…

X AI KOLs Timeline ↗ · 2026-05-20 Cached

An open-source repository called train-llm-from-scratch enables training billion-parameter LLMs on a single GPU, with a configurable pipeline from raw text to inference, including dataset streaming and checkpointing, under MIT License.

0 favorites 0 likes

#llm-training

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

arXiv cs.LG ↗ · 2026-05-20

DynaTrain is a distributed training system enabling sub-second online reconfiguration of parallelism for large language models, using a Virtual Parameter Space abstraction to achieve up to three orders of magnitude faster transitions than existing methods.

0 favorites 0 likes

#llm-training

A Theory of Training Profit-Optimal LLMs

arXiv cs.LG ↗ · 2026-05-19 Cached

This paper develops an economic model combining scaling laws with microeconomic theory to analyze profit-optimal training of large language models, considering trade-offs between model quality, training costs, and hardware efficiency.

0 favorites 0 likes

#llm-training

Notes on pretraining parallelisms and failed training runs (12 minute read)

TLDR AI ↗ · 2026-05-18 Cached

A technical deep-dive into common causes of failed pretraining runs in large language models, including causality-breaking issues in expert routing and numerical precision bugs, with examples from Llama 4, Gemini 2 Pro, and GPT-4.

0 favorites 0 likes

#llm-training

@tom_doerr: Trains billion-parameter LLMs from scratch on a single GPU https://github.com/FareedKhan-dev/train-llm-from-scratch…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

A GitHub repository provides scripts to train billion-parameter language models from scratch on a single GPU using PyTorch, based on the Transformer architecture.

0 favorites 0 likes

#llm-training

@LakshyAAAgrawal: Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization.…

X AI KOLs Following ↗ · 2026-05-13

Fast-Slow Training (FST) interleaves context optimization (via GEPA) with model weight updates via RL, achieving 3× sample efficiency over RL alone on math, code, and physics reasoning while preserving plasticity and enabling continual learning.

0 favorites 0 likes

llm-training

Submit Feedback