Tag
Hard-KV introduces a Cascade Cache hierarchy and Logits Calibration mechanism to resolve the static-dynamic mismatch in head-adaptive KV cache compression, achieving up to 2x throughput improvement in long-context LLM inference.