Nonlinear computation in deep linear networks

OpenAI Blog 09/29/17, 07:00 AM Papers

deep-learning linear-networks neural-networks theory nonlinear-computation openai-research

Summary

OpenAI research explores how nonlinear computation can emerge in deep linear networks, presenting theoretical and empirical analysis with code examples using TensorFlow.

No content available

Original Article

View Cached Full Text

Cached at: 04/20/26, 02:56 PM

# Nonlinear computation in deep linear networks Source: [https://openai.com/index/nonlinear-computation-in-deep-linear-networks/](https://openai.com/index/nonlinear-computation-in-deep-linear-networks/) `` ``` 1x = tf.placeholder(dtype=tf.float32, shape=[batch_size,784])2y = tf.placeholder(dtype=tf.float32, shape=[batch_size,10])34w1 = tf.Variable(np.random.normal(scale=np.sqrt(2./784),size=[784,512]).astype(np.float32))5b1 = tf.Variable(np.zeros(512,dtype=np.float32))6w2 = tf.Variable(np.random.normal(scale=np.sqrt(2./512),size=[512,512]).astype(np.float32))7b2 = tf.Variable(np.zeros(512,dtype=np.float32))8w3 = tf.Variable(np.random.normal(scale=np.sqrt(2./512),size=[512,10]).astype(np.float32))9b3 = tf.Variable(np.zeros(10,dtype=np.float32))1011params = [w1,b1,w2,b2,w3,b3]12nr_params = sum([np.prod(p.get_shape().as_list()) for p in params])13scaling = 2**1251415def get_logits(par):16 h1 = tf.nn.bias_add(tf.matmul(x , par[0]), par[1]) / scaling17 h2 = tf.nn.bias_add(tf.matmul(h1, par[2]) , par[3] / scaling) 18 o = tf.nn.bias_add(tf.matmul(h2, par[4]), par[5]/ scaling)*scaling19return o ```

Similar Articles

Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers

arXiv cs.LG

This paper proposes a plug-and-play framework that implements spike-friendly approximations for Transformer nonlinearities (e.g., Softmax, SiLU, normalization) via population computation with LIF neurons and lightweight bit-shift scaling, achieving less than 1% accuracy drop on LLMs without fine-tuning.

Understanding neural networks through sparse circuits

OpenAI Blog

OpenAI researchers present methods for training sparse neural networks that are easier to interpret by forcing most weights to zero, enabling the discovery of small, disentangled circuits that can explain model behavior while maintaining performance. This work aims to advance mechanistic interpretability as a complement to post-hoc analysis of dense networks and support AI safety goals.

The Hamilton-Jacobi Theory of Deep Learning

Hugging Face Daily Papers

This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.

Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for Exponential Compression of Deep Neural Networks

arXiv cs.LG

Introduces Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for compressing deep neural network layers via small core tensors, achieving high compression ratios while maintaining accuracy.

Techniques for training large neural networks