Tag
A tip on KV-cache compression for transformer models: start with uncompressed keys and compressed values, then adjust based on model family sensitivity; try asymmetric before symmetric compression.