Tag
This paper establishes a mathematically rigorous connection between shock-wave theory and symmetry-quotiented learning dynamics of stochastic gradient descent, showing that after symmetry reduction and coarse-graining, the dynamics satisfy viscous Hamilton-Jacobi and Burgers-type equations with shock formation times controlled by loss curvature.
This paper analyzes generalization error, uniform stability, and uniform argument stability of gradient descent (GD) and stochastic gradient descent (SGD) over discrete parameter spaces with deterministic or stochastic rounding, showing that rounding degrades generalization for GD and introduces dimension-dependent errors for stochastic rounding.