Tag
This paper proposes a novel structured neuron pruning framework for deep neural networks using multi-armed bandit algorithms, demonstrating effectiveness on various tasks.
Proposes KOFF, a framework that decomposes pretrained LLMs into a sparse shared backbone and domain-specific external memories using structured pruning and LoRA adapters, achieving 12% sparsity without significant performance loss.