transformer-architectures

#transformer-architectures

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

arXiv cs.CL ↗ · 5h ago Cached

This paper presents a cascaded multi-granularity pruning framework for deploying LLMs on Industrial IoT edge devices, achieving up to 13.8x compression with minimal accuracy loss on MHA+GELU architectures while exposing a collapse on GQA+SwiGLU designs.

0 favorites 0 likes

#transformer-architectures

On the Residual Scaling of Looped Transformers: Stability and Transferability

arXiv cs.LG ↗ · 2026-06-18 Cached

This paper analyzes residual scaling in looped (weight-tied) transformers, showing that weight sharing requires stronger scaling (1/N) than standard residual networks, and derives a factored parameterization that enables hyperparameter transfer across loop counts without retuning.

0 favorites 0 likes

transformer-architectures

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

On the Residual Scaling of Looped Transformers: Stability and Transferability

Submit Feedback