attention-aware

Tag

Cards List
#attention-aware

New KV Quants coming 😍 Welcome OSCAR kv quant open sourced by togetherAI

Reddit r/LocalLLaMA β†— Β· 2026-05-26 Cached

Together AI open-sources OSCAR, an attention-aware 2-bit KV cache quantization system that enables efficient long-context LLM serving by redistributing quantization error according to attention importance.

0 favorites 0 likes
#attention-aware

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Hugging Face Daily Papers β†— Β· 2026-05-18 Cached

OSCAR is an offline spectral covariance-aware rotation method for 2-bit KV cache quantization that aligns quantization with attention covariance structures, achieving high accuracy and efficiency for long-context LLM serving.

0 favorites 0 likes
← Back to home

Submit Feedback