Tag
This paper proves sharp dimension-free first-order lower bounds for finding epsilon-stationary points in higher-order smooth nonconvex optimization, resolving open problems for Hessian-Lipschitz and third-order smooth cases.
This paper generalizes non-uniform smoothness assumptions to objectives whose curvature is affine in the objective value, proving convergence rates for steepest descent and diagonal variants of RMSProp and Adam, with applications to logistic regression and neural networks.
An article explaining the concepts of strong convexity and L-smoothness in optimization, known as the quadratic sandwich, and their role in gradient descent performance.
This paper investigates smoothness degradation in extremely quantized Large Language Models, arguing that preserving smoothness is crucial for maintaining performance beyond numerical accuracy.