Tag
InfoQuant introduces a train-free method, Peak Suppression Orthogonal Transformation (PSOT), to reshape activation distributions for low-bit LLM quantization, preserving 97% floating-point accuracy under W4A4KV4 and outperforming prior PTQ methods.
OSCAR is an offline spectral covariance-aware rotation method for 2-bit KV cache quantization that aligns quantization with attention covariance structures, achieving high accuracy and efficiency for long-context LLM serving.