serving-framework

Tag

Cards List
#serving-framework

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Hugging Face Daily Papers · 2026-06-15 Cached

The paper introduces Tangram, a serving framework that statically resolves non-uniform KV cache compression for multi-turn LLM serving, achieving up to 2.6x throughput improvement over the full-KV baseline by eliminating runtime overheads.

0 favorites 0 likes
← Back to home

Submit Feedback