jacobian

#jacobian

DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty

arXiv cs.LG ↗ · 2026-06-24 Cached

This paper presents a large-scale empirical study of the Derivative Regularization (DREG) penalty, showing it achieves high accuracy and noise robustness, particularly with GELU activation and data-scarce regimes, positioning it as a general-purpose plug-and-play regularizer for neural networks.

0 favorites 0 likes

#jacobian

@techNmak: This math sits underneath every AI model being trained right now. Gradient. Jacobian. Hessian. Three words that look in…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

Explains the mathematical concepts of gradient, Jacobian, and Hessian as fundamental tools in AI model training, describing how they measure change and their roles in optimization.

0 favorites 0 likes

#jacobian

Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology

arXiv cs.LG ↗ · 2026-05-15 Cached

This paper performs full Jacobian eigendecomposition across production-scale LLMs, revealing a learned spectral gradient from rotation-dominated early layers to symmetric late layers, along with a low-rank bottleneck that compresses perturbations. The results link perturbation propagation and compression to network functional topology.

0 favorites 0 likes

jacobian

DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty

@techNmak: This math sits underneath every AI model being trained right now. Gradient. Jacobian. Hessian. Three words that look in…

Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology

Submit Feedback