conditional-computation

#conditional-computation

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper proposes a dense-to-sparse continual training method for LLMs, using a predictor-gated bank-wise sparsity to achieve 4x FFN sparsity, and demonstrates it on Qwen2.5-8B with long-context training.

0 favorites 0 likes

#conditional-computation

Learning sparse neural networks through L₀ regularization

OpenAI Blog ↗ · 2017-12-04 Cached

OpenAI proposes a practical L₀ regularization method for neural networks that encourages weights to become exactly zero during training, enabling network pruning for improved speed and generalization. The method uses stochastic gates and introduces the hard concrete distribution to make the non-differentiable L₀ norm optimization tractable via gradient descent.

0 favorites 0 likes

conditional-computation

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

Learning sparse neural networks through L₀ regularization

Submit Feedback