flat-minima

#flat-minima

Closed-Form Steepest Descent Direction toward Flat Minima: Reducing Upper Bounds on the Loss Hessian Eigenspectrum in Neural Networks

arXiv cs.LG ↗ · yesterday Cached

Derives the closed-form gradient of the Wolkowicz-Styan upper bound on the loss Hessian eigenspectrum to guide neural network training toward flat minima, and introduces Hessian Spectral Range (HSR) Regularization. Numerical experiments show that HSR narrows the Hessian eigenvalue range, avoids sharp minima and saddle points, and achieves flat solutions comparable to Sharpness-Aware Minimization (SAM).

0 favorites 0 likes

#flat-minima

Are Flat Minima an Illusion?

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper challenges the common belief that flat minima cause better generalization in neural networks, arguing that 'weakness'—a reparameterization-invariant measure of function simplicity—is the true driver. Empirical results on MNIST and Fashion-MNIST show that weakness predicts generalization while sharpness anticorrelates, and the large-batch generalization advantage vanishes as training data increases.

0 favorites 0 likes

flat-minima

Closed-Form Steepest Descent Direction toward Flat Minima: Reducing Upper Bounds on the Loss Hessian Eigenspectrum in Neural Networks

Are Flat Minima an Illusion?

Submit Feedback