Nonlinear computation in deep linear networks

OpenAI Blog Papers

Summary

OpenAI research explores how nonlinear computation can emerge in deep linear networks, presenting theoretical and empirical analysis with code examples using TensorFlow.

No content available
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:56 PM

# Nonlinear computation in deep linear networks Source: [https://openai.com/index/nonlinear-computation-in-deep-linear-networks/](https://openai.com/index/nonlinear-computation-in-deep-linear-networks/) `` ``` 1x = tf.placeholder(dtype=tf.float32, shape=[batch_size,784])2y = tf.placeholder(dtype=tf.float32, shape=[batch_size,10])34w1 = tf.Variable(np.random.normal(scale=np.sqrt(2./784),size=[784,512]).astype(np.float32))5b1 = tf.Variable(np.zeros(512,dtype=np.float32))6w2 = tf.Variable(np.random.normal(scale=np.sqrt(2./512),size=[512,512]).astype(np.float32))7b2 = tf.Variable(np.zeros(512,dtype=np.float32))8w3 = tf.Variable(np.random.normal(scale=np.sqrt(2./512),size=[512,10]).astype(np.float32))9b3 = tf.Variable(np.zeros(10,dtype=np.float32))1011params = [w1,b1,w2,b2,w3,b3]12nr_params = sum([np.prod(p.get_shape().as_list()) for p in params])13scaling = 2**1251415def get_logits(par):16 h1 = tf.nn.bias_add(tf.matmul(x , par[0]), par[1]) / scaling17 h2 = tf.nn.bias_add(tf.matmul(h1, par[2]) , par[3] / scaling) 18 o = tf.nn.bias_add(tf.matmul(h2, par[4]), par[5]/ scaling)*scaling19return o ```

Similar Articles

Understanding neural networks through sparse circuits

OpenAI Blog

OpenAI researchers present methods for training sparse neural networks that are easier to interpret by forcing most weights to zero, enabling the discovery of small, disentangled circuits that can explain model behavior while maintaining performance. This work aims to advance mechanistic interpretability as a complement to post-hoc analysis of dense networks and support AI safety goals.

Techniques for training large neural networks

OpenAI Blog

OpenAI presents comprehensive techniques for training large neural networks across distributed GPU clusters, covering data parallelism, pipeline parallelism, tensor parallelism, and mixture-of-experts approaches to overcome engineering and scalability challenges.

AI and compute

OpenAI Blog

OpenAI releases an analysis demonstrating that compute used in largest AI training runs has grown exponentially at a 3.4-month doubling time since 2012, representing a 300,000x increase and vastly outpacing Moore's Law. The analysis suggests this trend will likely continue and calls for increased academic AI research funding to address rising computational costs.

AI and efficiency

OpenAI Blog

OpenAI analyzes trends in AI algorithmic efficiency, showing that compute required to reach AlexNet-level performance has halved roughly every 16 months since 2012, outpacing hardware gains. The study draws comparisons across domains like DNA sequencing and transistor density to contextualize AI progress.

Trading inference-time compute for adversarial robustness

OpenAI Blog

OpenAI presents evidence that reasoning models like o1 become more robust to adversarial attacks when given more inference-time compute to think longer. The research demonstrates that increased computation reduces attack success rates across multiple task types including mathematics, factuality, and adversarial images, though significant exceptions remain.