llm-pruning

#llm-pruning

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

arXiv cs.CL ↗ · 2d ago Cached

This paper presents a cascaded multi-granularity pruning framework for deploying LLMs on Industrial IoT edge devices, achieving up to 13.8x compression with minimal accuracy loss on MHA+GELU architectures while exposing a collapse on GQA+SwiGLU designs.

0 favorites 0 likes

#llm-pruning

Small LLMs: Pruning vs. Training from Scratch

arXiv cs.LG ↗ · 2026-06-15 Cached

This paper empirically compares pruning vs. training small language models from scratch, finding that pruning provides a strong advantage under limited token budgets but that the advantage diminishes as training scales, especially with coarse pruning.

0 favorites 0 likes

llm-pruning

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

Small LLMs: Pruning vs. Training from Scratch

Submit Feedback