Learning sparse neural networks through L₀ regularization
Summary
OpenAI proposes a practical L₀ regularization method for neural networks that encourages weights to become exactly zero during training, enabling network pruning for improved speed and generalization. The method uses stochastic gates and introduces the hard concrete distribution to make the non-differentiable L₀ norm optimization tractable via gradient descent.
View Cached Full Text
Cached at: 04/20/26, 02:56 PM
Similar Articles
Understanding neural networks through sparse circuits
OpenAI researchers present methods for training sparse neural networks that are easier to interpret by forcing most weights to zero, enabling the discovery of small, disentangled circuits that can explain model behavior while maintaining performance. This work aims to advance mechanistic interpretability as a complement to post-hoc analysis of dense networks and support AI safety goals.
JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models
JumpLoRA introduces a novel sparse adapter framework for continual learning in LLMs using JumpReLU gating to dynamically isolate task parameters and prevent catastrophic forgetting. The method enhances LoRA-based approaches and outperforms state-of-the-art continual learning methods like ELLA.
Weight normalization: A simple reparameterization to accelerate training of deep neural networks
OpenAI presents weight normalization, a reparameterization technique that decouples weight vector length from direction to improve neural network training convergence and computational efficiency without introducing minibatch dependencies, making it suitable for RNNs and noise-sensitive applications.
Estimating worst case frontier risks of open weight LLMs
OpenAI researchers study worst-case frontier risks of releasing open-weight LLMs through malicious fine-tuning (MFT) in biology and cybersecurity domains, finding that open-weight models underperform frontier closed-weight models and don't substantially advance harmful capabilities.
Accelerating LMO-Based Optimization via Implicit Gradient Transport
This paper proposes LMO-IGT, a new class of stochastic optimization methods that accelerates convergence using implicit gradient transport while maintaining a single-gradient-per-iteration structure. It introduces a unified theoretical framework and demonstrates improved performance over existing LMO-based optimizers like Muon.