Nonlinear computation in deep linear networks
Summary
OpenAI research explores how nonlinear computation can emerge in deep linear networks, presenting theoretical and empirical analysis with code examples using TensorFlow.
View Cached Full Text
Cached at: 04/20/26, 02:56 PM
Similar Articles
Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers
This paper proposes a plug-and-play framework that implements spike-friendly approximations for Transformer nonlinearities (e.g., Softmax, SiLU, normalization) via population computation with LIF neurons and lightweight bit-shift scaling, achieving less than 1% accuracy drop on LLMs without fine-tuning.
Understanding neural networks through sparse circuits
OpenAI researchers present methods for training sparse neural networks that are easier to interpret by forcing most weights to zero, enabling the discovery of small, disentangled circuits that can explain model behavior while maintaining performance. This work aims to advance mechanistic interpretability as a complement to post-hoc analysis of dense networks and support AI safety goals.
The Hamilton-Jacobi Theory of Deep Learning
This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.
Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for Exponential Compression of Deep Neural Networks
Introduces Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for compressing deep neural network layers via small core tensors, achieving high compression ratios while maintaining accuracy.
Techniques for training large neural networks
OpenAI presents comprehensive techniques for training large neural networks across distributed GPU clusters, covering data parallelism, pipeline parallelism, tensor parallelism, and mixture-of-experts approaches to overcome engineering and scalability challenges.