Squeeze-Release: Iterative Pruning with Exact Structural Minimization
Summary
This paper introduces Squeeze-Release, an iterative pruning method that achieves exact structural minimization.
View Cached Full Text
Cached at: 06/15/26, 04:59 PM
Paper page - Squeeze-Release: Iterative Pruning with Exact Structural Minimization
Source: https://huggingface.co/papers/2606.14346 Get this paper in your agent:
hf papers read 2606\.14346
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.14346 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.14346 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.14346 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression
Proposes a structural pruning framework for MoE models that maximizes channel-score coverage via attribution-based approximation, achieving 50% or 25% pruning with 4-bit quantization and reducing memory footprint by 5.27x on Qwen3-30B-A3B.
Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
A novel end-to-end framework for LLM compression that jointly optimizes structural pruning and mixed-precision quantization, achieving significant perplexity reductions and speedups over state-of-the-art methods, especially at ultra-low bit precisions.
SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs
SHAPE proposes a coalition-aware expert pruning framework for sparse MoE LLMs that uses Shapley-style attribution over routing traces to identify essential experts, achieving competitive accuracy under 20-40% pruning and reducing GPU memory footprint.
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
This paper explores structured pruning and knowledge distillation techniques for compressing large Mixture-of-Experts (MoE) models during pre-training. It demonstrates that progressive pruning and combined distillation strategies, such as multi-token prediction distillation, improve downstream performance, exemplified by compressing Qwen3-Next-80A3B to a more efficient 23A2B model.
Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling
SemiPrune is a label-efficient dataset pruning framework that uses semi-supervised learning to generate pseudo-labels from a small labeled subset, enabling existing supervised pruning methods to work with unlabeled data. It achieves state-of-the-art performance on domain-specific, image-corrupted, and long-tailed datasets.