Tag
Proposes KOFF, a framework that decomposes pretrained LLMs into a sparse shared backbone and domain-specific external memories using structured pruning and LoRA adapters, achieving 12% sparsity without significant performance loss.