Tag
This paper generalizes non-uniform smoothness assumptions to objectives whose curvature is affine in the objective value, proving convergence rates for steepest descent and diagonal variants of RMSProp and Adam, with applications to logistic regression and neural networks.