deep-learning-theory

#deep-learning-theory

Gradient Flow Dynamics and Implicit Bias of Diagonal Linear Networks under Infinitesimal Initialization

arXiv cs.LG ↗ · 2026-07-15 Cached

This paper studies the gradient flow dynamics of diagonal linear networks under infinitesimal initialization, generalizing previous results to deep networks and a broader class. It shows that the implicit bias corresponds to a modified ℓ1 norm and identifies the Structural Invariant Manifold as a key geometric structure.

0 favorites 0 likes

#deep-learning-theory

Edge of Stability Selectively Shapes Learning Across the Data Distribution

arXiv cs.LG ↗ · 2026-06-04 Cached

MIT researchers show that the edge of stability (EoS) in neural network training is not merely a global optimization phenomenon but selectively redistributes learning across subsets of the training distribution, amplifying progress on some data groups while suppressing others. They identify two key conditions governing this allocation: gradient alignment with the top Hessian eigenvector and sustained non-vanishing gradient magnitude.

0 favorites 0 likes

#deep-learning-theory

Neural Networks Provably Learn Spectral Representations for Group Composition

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper theoretically demonstrates that two-layer neural networks trained on group composition tasks learn spectral representations, with neurons converging to irreducible representations and achieving rotational rank-one alignment, providing a representation-theoretic account of feature learning.

0 favorites 0 likes

#deep-learning-theory

Neural Networks Provably Learn Spectral Representations for Group Composition

Hugging Face Daily Papers ↗ · 2026-06-02

This paper provides a theoretical analysis of how neural networks learn structured representations during group composition tasks, proving that training dynamics drive neurons to converge to irreducible group representations with exponential convergence rates. The work establishes a representation-theoretic account of feature learning and characterizes a low-rank compression phenomenon for matrix-valued group representations.

0 favorites 0 likes

#deep-learning-theory

Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

arXiv cs.LG ↗ · 2026-05-21 Cached

This paper identifies a collapse-and-refine mechanism in diffusion models under the manifold hypothesis, proposing Score-induced Latent Diffusion (SiLD) that provably avoids the curse of dimensionality. Experiments show SiLD matches or outperforms VAE-based latent diffusion models.

0 favorites 0 likes

#deep-learning-theory

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

This paper argues that robust state tracking in recurrent models depends on error control dynamics rather than just expressive capacity, proving that affine recurrent networks suffer from accumulating errors that limit their effective horizon.

0 favorites 0 likes

#deep-learning-theory

Generalized Neurons

ML at Berkeley ↗ · 2021-02-16 Cached

The article explores the Universal Approximation Theorem in deep learning, analyzing the representation capacity of individual neurons and neural network layers using ReLU activation functions.

0 favorites 0 likes

deep-learning-theory

Gradient Flow Dynamics and Implicit Bias of Diagonal Linear Networks under Infinitesimal Initialization

Edge of Stability Selectively Shapes Learning Across the Data Distribution

Neural Networks Provably Learn Spectral Representations for Group Composition

Neural Networks Provably Learn Spectral Representations for Group Composition

Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

Generalized Neurons

Submit Feedback