transformer-optimization

Tag

Cards List
#transformer-optimization

Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]

Reddit r/MachineLearning · 2026-04-23

Author shares experience hitting diminishing returns with FP16 + ONNX + pruning on 162 MB transformer, seeks advice on next best steps among quantization, distillation, low-rank factorization, or hardware-specific tricks.

0 favorites 0 likes
← Back to home

Submit Feedback