Tag
Introduces QAM-W, a joint 2D codebook quantization method for LLM weights using Hadamard rotation and activation-aware scaling, achieving near BF16 perplexity at 5–6 bits per weight and matching SmoothQuant W8A8 quality with 32% fewer weight bits.