Tag
This study reveals a 'Smart Pruning Paradox' where activation-aware pruning methods like Wanda preserve perplexity but significantly amplify bias in Large Language Models deployed on edge devices.
Tencent's AngelSlim team released Hy-MT1.5-1.8B-1.25bit, a highly compressed 1.25-bit machine translation model supporting 33 languages that fits in 440MB for on-device use. It utilizes the Sherry quantization algorithm to achieve world-class translation quality comparable to much larger models.