Tag
Together AI open-sources OSCAR, an attention-aware 2-bit KV cache quantization system that enables efficient long-context LLM serving by redistributing quantization error according to attention importance.