memory-hierarchy

Tag

Cards List
#memory-hierarchy

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

arXiv cs.CL · 2026-04-23 Cached

TTKV introduces a temporal-tiered KV cache that mimics human memory to cut 128K-context LLM inference latency by 76% and double throughput while reducing cross-tier traffic 5.94×.

0 favorites 0 likes
← Back to home

Submit Feedback