weight-compression

#weight-compression

Qift: Shift-Friendly No-Zero W2 Post-Training Quantization for Rotated W2A4/KV4 LLM Inference

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces Qift, a fixed no-zero two-bit weight quantization level set designed for Hadamard-rotated LLMs, achieving improved W2A4/KV4 inference by leveraging the near-zero-centered Gaussian-like distribution of rotated weights. Experiments on LLaMA-2-7B and LLaMA-3.1-8B show consistent perplexity gains over standard W2 quantization.

0 favorites 0 likes

weight-compression

Qift: Shift-Friendly No-Zero W2 Post-Training Quantization for Rotated W2A4/KV4 LLM Inference

Submit Feedback