structural-pruning

Tag

Cards List
#structural-pruning

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

arXiv cs.LG · 3d ago Cached

Proposes a structural pruning framework for MoE models that maximizes channel-score coverage via attribution-based approximation, achieving 50% or 25% pruning with 4-bit quantization and reducing memory footprint by 5.27x on Qwen3-30B-A3B.

0 favorites 0 likes
#structural-pruning

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

arXiv cs.AI · 2026-06-09 Cached

A novel end-to-end framework for LLM compression that jointly optimizes structural pruning and mixed-precision quantization, achieving significant perplexity reductions and speedups over state-of-the-art methods, especially at ultra-low bit precisions.

0 favorites 0 likes
← Back to home

Submit Feedback