Tag
This paper analyzes generalization error, uniform stability, and uniform argument stability of gradient descent (GD) and stochastic gradient descent (SGD) over discrete parameter spaces with deterministic or stochastic rounding, showing that rounding degrades generalization for GD and introduces dimension-dependent errors for stochastic rounding.
Explores the behavior of floor and ceil functions when applied to denormalized floating-point numbers, highlighting differences between CPU and GPU implementations and potential pitfalls.