attention-aware

#attention-aware

New KV Quants coming 😍 Welcome OSCAR kv quant open sourced by togetherAI

Reddit r/LocalLLaMA ↗ · 2026-05-26 Cached

Together AI open-sources OSCAR, an attention-aware 2-bit KV cache quantization system that enables efficient long-context LLM serving by redistributing quantization error according to attention importance.

0 favorites 0 likes

#attention-aware

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

OSCAR is an offline spectral covariance-aware rotation method for 2-bit KV cache quantization that aligns quantization with attention covariance structures, achieving high accuracy and efficiency for long-context LLM serving.

0 favorites 0 likes

attention-aware

New KV Quants coming 😍 Welcome OSCAR kv quant open sourced by togetherAI

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Submit Feedback