spectral-optimization

#spectral-optimization

Anytime Training with Schedule-Free Spectral Optimization

arXiv cs.LG ↗ · 2026-05-25 Cached

This paper introduces SF-NorMuon, a schedule-free spectral optimizer that matches or exceeds tuned AdamW on language models up to 772M parameters, with theoretical guarantees for stationarity and long-horizon stability.

0 favorites 0 likes

#spectral-optimization

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

arXiv cs.LG ↗ · 2026-05-19 Cached

This paper identifies a geometric mismatch in the Dion low-rank spectral optimizer and proposes Orth-Dion, which replaces column normalization with QR orthogonalization to close the convergence gap to full-rank methods like Muon at the same communication cost, validated on large-scale language model pre-training.

0 favorites 0 likes

spectral-optimization

Anytime Training with Schedule-Free Spectral Optimization

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

Submit Feedback