Deep double descent
Summary
OpenAI research reveals the 'double descent' phenomenon where test error exhibits a non-monotonic pattern as both model size and training steps increase, challenging traditional understanding of the bias-variance tradeoff in deep learning.
View Cached Full Text
Cached at: 04/20/26, 02:43 PM
Similar Articles
How AI training scales
OpenAI researchers discovered that the gradient noise scale, a simple statistical metric, predicts the parallelizability of neural network training across a wide range of tasks. They found that more complex tasks and more powerful models tolerate larger batch sizes, suggesting future AI systems can scale further through increased parallelization.
Variance reduction for policy gradient with action-dependent factorized baselines
OpenAI researchers derive a bias-free action-dependent baseline for variance reduction in policy gradient methods, demonstrating improved learning efficiency on high-dimensional control tasks, multi-agent, and partially observed environments.
Improved Techniques for Training Consistency Models
OpenAI presents improved techniques for training consistency models that enable high-quality single-step image generation without distillation, achieving significant FID improvements on CIFAR-10 and ImageNet 64×64 through novel loss functions and training strategies.
OpenAI Baselines: ACKTR & A2C
OpenAI releases ACKTR and A2C algorithms as part of its Baselines library, with ACKTR demonstrating improved sample complexity through natural gradient descent while maintaining computational efficiency comparable to first-order methods.
Are Flat Minima an Illusion?
This paper challenges the common belief that flat minima cause better generalization in neural networks, arguing that 'weakness'—a reparameterization-invariant measure of function simplicity—is the true driver. Empirical results on MNIST and Fashion-MNIST show that weakness predicts generalization while sharpness anticorrelates, and the large-batch generalization advantage vanishes as training data increases.