@0xLogicrw: Tilde Research found a hidden flaw in the Muon optimizer, used by leading models like DeepSeek V4, Kimi K2.5, and GLM-5: it causes over a quarter of MLP layer neurons to die permanently in early training. The team designed an alternative optimizer, Auro…
Summary
Tilde Research discovered a flaw in the Muon optimizer that leads to early death of MLP neurons and open-sourced an alternative, Aurora. While maintaining orthogonality, Aurora resolves the neuron death issue, significantly improving training efficiency.
Similar Articles
Aurora: A Leverage-Aware Spectral Optimizer
Aurora is a leverage-aware spectral optimizer that addresses neuron death in MLP layers by enforcing row uniformity while preserving the polar factor geometry of Muon updates, achieving state-of-the-art performance on the modded-nanoGPT speedrun benchmark.
Aurora: A Leverage-Aware Optimizer for Rectangular Matrices
Tilde Research introduces Aurora, a new optimizer designed to prevent neuron death in MLP layers while maintaining orthogonality, achieving state-of-the-art results on nanoGPT benchmarks and 100x data efficiency on 1B models.
Can Muon Fine-tune Adam-Pretrained Models?
Research paper investigating performance degradation when using the Muon optimizer instead of Adam for fine-tuning pretrained models, demonstrating that parameter-efficient methods like LoRA effectively mitigate this optimizer mismatch across language and vision tasks.
@berryxia: Moonshot AI founder Yang Zhilin recently released a 40-minute video. Born in 1992, valedictorian of Tsinghua CS undergrad, PhD from CMU, co-author of Transformer-XL and XLNet, former researcher at Google Brain and Meta, he calmly deconstructs Kimi K2 in front of the camera...
Moonshot AI founder Yang Zhilin released a 40-minute video detailing the training process of the Kimi K2 model, which cost only $4.6 million. In an 8-model real-time programming competition, Kimi K2 took first place, defeating GPT-5.5 and others, demonstrating how a small team can overturn the traditional compute-stacking paradigm through architecture optimization.
Open source battle: GLM vs Kimi vs MiMo vs DeepSeek
This article tests four open-source Chinese AI models — Zhipu GLM 5.1, Moonshot Kimi K2.6, Stepfun MIMO 2.5 Pro, and DeepSeek V4 Pro — on programming tasks. It finds that GLM leads overall in most tasks but not absolutely; each model has its own strengths and weaknesses.