Layer-wise Derivative Controlled Networks
Summary
Introduces ChainzRule, a neural architecture using Polynomial Engine and Differential Regularization to balance accuracy, hardware efficiency, and functional stability, outperforming standard models with 15.5x fewer parameters and smoother gradients.
View Cached Full Text
Cached at: 05/18/26, 06:41 AM
# Layer-wise Derivative Controlled Networks Source: [https://arxiv.org/abs/2605.15463](https://arxiv.org/abs/2605.15463) [View PDF](https://arxiv.org/pdf/2605.15463) > Abstract:As machine learning models grow in complexity, they increasingly struggle with three conflicting demands: the need for high accuracy, the requirement for hardware efficiency, and the necessity of functional stability\. Traditional architectures often achieve performance at the expense of spiky or unpredictable behavior, where small changes in input lead to massive swings in output \-\- a critical flaw for real\-world deployment in sensitive environments\. This paper introduces ChainzRule \(CR\), a novel neural architecture designed to harmonize these competing goals\. ChainzRule replaces standard piecewise\-linear activations with a Polynomial Engine governed by Differential Regularization \(DREG\)\. Unlike traditional methods that impose global, coarse\-grained constraints on a model's Lipschitz constant, DREG acts as a targeted regularization on intermediate derivatives\. This approach suppresses extreme sensitivity without attenuating the representational power inherent in the Polynomial Engine\. In head\-to\-head "Fair Fight" benchmarks, ChainzRule outperformed standard models while using 15\.5x fewer parameters\. On the MNIST dataset, it reduced peak gradient volatility by an average of 23\.1%, ensuring a smoother and more predictable manifold\. On Yelp Full ordinal regression under explicit DREG regularization, ChainzRule achieves 70\.17% accuracy, validating that derivative\-aware regularization is compatible with competitive performance on realistic tasks\. By embedding gradient awareness into the architecture via DREG, ChainzRule demonstrates that stability and accuracy need not be competing objectives\. ## Submission history From: Rowan Martnishn \[[view email](https://arxiv.org/show-email/d1e8d0fa/2605.15463)\] **\[v1\]**Thu, 14 May 2026 22:57:51 UTC \(744 KB\)
Similar Articles
ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks
ChainzRule introduces a neural architecture with learnable polynomial layers and differential regularization, achieving sample-efficient, robust performance across tabular, NLP, and vision tasks with results on Pima Diabetes, SST-5, Yelp Full, and CIFAR-10-C.
DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers
Introduces DisjunctiveNet, a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks via differentiable convexified optimization layers, achieving perfect rule satisfaction on real-world datasets.
DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty
This paper presents a large-scale empirical study of the Derivative Regularization (DREG) penalty, showing it achieves high accuracy and noise robustness, particularly with GELU activation and data-scarce regimes, positioning it as a general-purpose plug-and-play regularizer for neural networks.
Low-power analogue neural networks with trainable nonlinear connections for continuous control
This paper presents low-power analogue neural networks that place trainable nonlinear functions on connections, inspired by Kolmogorov-Arnold networks, enabling efficient continuous control tasks with far fewer nodes and connections than multilayer perceptrons, demonstrated on hardware with projected microWatt power.
Communication Dynamics Neural Networks: FFT-Diagonalized Layers for Improved Hessian Conditioning at Reduced Parameter Count
This paper introduces CDLinear, a block-circulant neural network layer that reduces parameter count and improves Hessian conditioning via FFT diagonalization, validated on MNIST with theoretical proofs.