Tag
This paper introduces Pion, a novel spectrum-preserving optimizer for large language model training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers.
Tilde Research discovered a flaw in the Muon optimizer that leads to early death of MLP neurons and open-sourced an alternative, Aurora. While maintaining orthogonality, Aurora resolves the neuron death issue, significantly improving training efficiency.
AdaPreLoRA is a novel LoRA optimizer that uses Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage, demonstrating competitive performance across various LLMs and tasks.