edge-of-stability

#edge-of-stability

Flatland: The Adventures of Gradient Descent with Large Step Sizes

arXiv cs.LG ↗ · 2026-06-08 Cached

This paper addresses the open question of maximum step size for gradient descent convergence on non-L-smooth objectives, introducing adaptive methods that operate at the edge of stability and can minimize sharpness globally.

0 favorites 0 likes

#edge-of-stability

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper shows that discrete Gradient Descent with large step sizes restores symmetry in multi-pathway Deep Linear Networks, countering the symmetry-breaking predicted by Gradient Flow, and leads to signal re-balancing across pathways. The authors theoretically prove that balanced solutions are flatter (less sharp) than sparse ones, and large learning rates drive the network toward stable, balanced configurations.

0 favorites 0 likes

#edge-of-stability

Edge of Stability Selectively Shapes Learning Across the Data Distribution

arXiv cs.LG ↗ · 2026-06-04 Cached

MIT researchers show that the edge of stability (EoS) in neural network training is not merely a global optimization phenomenon but selectively redistributes learning across subsets of the training distribution, amplifying progress on some data groups while suppressing others. They identify two key conditions governing this allocation: gradient alignment with the top Hessian eigenvector and sustained non-vanishing gradient magnitude.

0 favorites 0 likes

#edge-of-stability

A Rod Flow Model for Adam at the Edge of Stability

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper introduces a 'rod flow' model for Adam and other adaptive optimizers to better analyze their behavior at the edge of stability. It extends continuous-time modeling to momentum methods, showing improved accuracy in tracking discrete iterates compared to stable flow models.

0 favorites 0 likes

edge-of-stability

Flatland: The Adventures of Gradient Descent with Large Step Sizes

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

Edge of Stability Selectively Shapes Learning Across the Data Distribution

A Rod Flow Model for Adam at the Edge of Stability

Submit Feedback