The Hamilton-Jacobi Theory of Deep Learning

Hugging Face Daily Papers Papers

Summary

This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.

In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.
Original Article
View Cached Full Text

Cached at: 06/02/26, 03:36 PM

Paper page - The Hamilton-Jacobi Theory of Deep Learning

Source: https://huggingface.co/papers/2605.28983

Abstract

Neural network training is formulated as a Hamilton--Jacobi initial-value problem where gradient steps correspond to solving viscous Hamilton--Jacobi equations, with connections to residual networks, transformers, and RNNs through shared mathematical structures.

In this paper, training a neural network is identified, exactly, as a search throughHamilton--Jacobi initial-value problems: each gradient step selects the initial data of aviscous Hamilton--Jacobi equationwhoseHopf--Cole propagatorbest fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact forlog-sum-exp layersand structural for broader architectures:residual networks,transformers, andrecurrent architectures(RNNs,LSTMs,SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network,tropical algebra, viscous PDE,convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: theminimax optimal generalization rateO(n^{-1/(d+2)}) for fixed t;adversarial robustnesscontrolled by varepsilon;backpropagationas theco-state equationof the Hamiltonian system forresidual networks(Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension viaPDE quadrature; and a closed-form O(N)influence function(softmax attribution weightsπ_j) whoseentropy landscapeundergoesfold bifurcationsas varepsilon increases, each merging attribution basins.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.28983

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.28983 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.28983 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.28983 in a Space README.md to link it from this page.

Collections including this paper1

Similar Articles

The Hamilton-Jacobi Theory of Deep Learning

arXiv cs.LG

This paper establishes an exact correspondence between neural network training and Hamilton-Jacobi initial-value problems, unifying deep learning architectures through a deformation parameter.