Tag
Shard is a drop-in HuggingFace Cache that achieves 10x KV cache compression for Llama-3.1-8B by using PCA plus int4 quantization on K and Hadamard rotation plus vector quantization on V, without accuracy loss on benchmarks.