structural-pruning

#structural-pruning

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

arXiv cs.LG ↗ · 3d ago Cached

Proposes a structural pruning framework for MoE models that maximizes channel-score coverage via attribution-based approximation, achieving 50% or 25% pruning with 4-bit quantization and reducing memory footprint by 5.27x on Qwen3-30B-A3B.

0 favorites 0 likes

#structural-pruning

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

arXiv cs.AI ↗ · 2026-06-09 Cached

A novel end-to-end framework for LLM compression that jointly optimizes structural pruning and mixed-precision quantization, achieving significant perplexity reductions and speedups over state-of-the-art methods, especially at ultra-low bit precisions.

0 favorites 0 likes

structural-pruning

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

Submit Feedback