optimizer

#optimizer

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Hugging Face Daily Papers ↗ · yesterday Cached

This paper introduces Pion, a novel spectrum-preserving optimizer for large language model training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers.

0 favorites 0 likes

#optimizer

@0xLogicrw: Tilde Research found a hidden flaw in the Muon optimizer, used by leading models like DeepSeek V4, Kimi K2.5, and GLM-5: it causes over a quarter of MLP layer neurons to die permanently in early training. The team designed an alternative optimizer, Auro…

X AI KOLs Timeline ↗ · 3d ago

Tilde Research discovered a flaw in the Muon optimizer that leads to early death of MLP neurons and open-sourced an alternative, Aurora. While maintaining orthogonality, Aurora resolves the neuron death issue, significantly improving training efficiency.

0 favorites 0 likes

#optimizer

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Hugging Face Daily Papers ↗ · 4d ago Cached

AdaPreLoRA is a novel LoRA optimizer that uses Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage, demonstrating competitive performance across various LLMs and tasks.

0 favorites 0 likes

optimizer

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

@0xLogicrw: Tilde Research found a hidden flaw in the Muon optimizer, used by leading models like DeepSeek V4, Kimi K2.5, and GLM-5: it causes over a quarter of MLP layer neurons to die permanently in early training. The team designed an alternative optimizer, Auro…

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

Submit Feedback