kv-cache-quantization

Tag

Cards List
#kv-cache-quantization

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Hugging Face Daily Papers · 2026-05-18 Cached

OSCAR is an offline spectral covariance-aware rotation method for 2-bit KV cache quantization that aligns quantization with attention covariance structures, achieving high accuracy and efficiency for long-context LLM serving.

0 favorites 0 likes
#kv-cache-quantization

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

arXiv cs.LG · 2026-05-12 Cached

This paper analyzes KV cache quantization schemes inspired by TurboQuant, using statistical inference and a new 6D error framework to evaluate quality measures like KL divergence and geometric error.

0 favorites 0 likes
← Back to home

Submit Feedback