serving-framework

#serving-framework

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

The paper introduces Tangram, a serving framework that statically resolves non-uniform KV cache compression for multi-turn LLM serving, achieving up to 2.6x throughput improvement over the full-KV baseline by eliminating runtime overheads.

0 favorites 0 likes

serving-framework

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Submit Feedback