Tag
This paper develops a sharp pseudospectral theory for block-triangular Jacobians in coupled gradient descent, proving Kreiss-constant bounds and establishing iteration complexity results. The work exposes non-asymptotic, instance-dependent transient amplification phenomena relevant to bilevel optimization, two-time-scale stochastic approximation, and GAN training.
This paper proposes the 'support-before-frequency' hypothesis for discrete diffusion models, suggesting that models first learn the support (admissible sequences) before refining frequencies within the support. Theoretical analysis of small-noise reverse kernels and experiments on masked language diffusion models support this claim.
This paper develops a local theory of gradient descent near bifurcations in dynamical models, showing that the state-space neural tangent kernel collapses to a rank-one operator that dominates learning dynamics, making optimization effectively low-dimensional and predictable from normal forms.