efficient-serving

Tag

Cards List
#efficient-serving

Semantic Cache Distillation: Efficient State Transfer via Reuse and Selective Patching

arXiv cs.LG · 3d ago Cached

This paper proposes Semantic Cache Distillation (SCD), a loss-constrained framework that replaces raw KV cache transmission with compact semantic codes, achieving up to 2.65x TTFT speedup while keeping generation quality within 5% F1 of the oracle.

0 favorites 0 likes
← Back to home

Submit Feedback