Tag
This paper theoretically characterizes the minimax risk of KV cache compression in transformers, providing design principles for accurate compression under causal masking, and instantiates them in a practical algorithm with promising results on LongBench.