Tag
The paper introduces Tangram, a serving framework that statically resolves non-uniform KV cache compression for multi-turn LLM serving, achieving up to 2.6x throughput improvement over the full-KV baseline by eliminating runtime overheads.