dense-to-sparse

Tag

Cards List
#dense-to-sparse

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

arXiv cs.CL · 2026-06-10 Cached

This paper proposes a dense-to-sparse continual training method for LLMs, using a predictor-gated bank-wise sparsity to achieve 4x FFN sparsity, and demonstrates it on Qwen2.5-8B with long-context training.

0 favorites 0 likes
← Back to home

Submit Feedback