Nonlinear computation in deep linear networks

OpenAI Blog Papers

Summary

OpenAI research explores how nonlinear computation can emerge in deep linear networks, presenting theoretical and empirical analysis with code examples using TensorFlow.

No content available
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:56 PM

# Nonlinear computation in deep linear networks Source: [https://openai.com/index/nonlinear-computation-in-deep-linear-networks/](https://openai.com/index/nonlinear-computation-in-deep-linear-networks/) `` ``` 1x = tf.placeholder(dtype=tf.float32, shape=[batch_size,784])2y = tf.placeholder(dtype=tf.float32, shape=[batch_size,10])34w1 = tf.Variable(np.random.normal(scale=np.sqrt(2./784),size=[784,512]).astype(np.float32))5b1 = tf.Variable(np.zeros(512,dtype=np.float32))6w2 = tf.Variable(np.random.normal(scale=np.sqrt(2./512),size=[512,512]).astype(np.float32))7b2 = tf.Variable(np.zeros(512,dtype=np.float32))8w3 = tf.Variable(np.random.normal(scale=np.sqrt(2./512),size=[512,10]).astype(np.float32))9b3 = tf.Variable(np.zeros(10,dtype=np.float32))1011params = [w1,b1,w2,b2,w3,b3]12nr_params = sum([np.prod(p.get_shape().as_list()) for p in params])13scaling = 2**1251415def get_logits(par):16 h1 = tf.nn.bias_add(tf.matmul(x , par[0]), par[1]) / scaling17 h2 = tf.nn.bias_add(tf.matmul(h1, par[2]) , par[3] / scaling) 18 o = tf.nn.bias_add(tf.matmul(h2, par[4]), par[5]/ scaling)*scaling19return o ```

Similar Articles

Understanding neural networks through sparse circuits

OpenAI Blog

OpenAI researchers present methods for training sparse neural networks that are easier to interpret by forcing most weights to zero, enabling the discovery of small, disentangled circuits that can explain model behavior while maintaining performance. This work aims to advance mechanistic interpretability as a complement to post-hoc analysis of dense networks and support AI safety goals.

The Hamilton-Jacobi Theory of Deep Learning

Hugging Face Daily Papers

This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.

Techniques for training large neural networks

OpenAI Blog

OpenAI presents comprehensive techniques for training large neural networks across distributed GPU clusters, covering data parallelism, pipeline parallelism, tensor parallelism, and mixture-of-experts approaches to overcome engineering and scalability challenges.