token-level-reconstruction

#token-level-reconstruction

SeKV: Resolution-Adaptive KV Cache with Hierarchical Semantic Memory for Long-Context LLM Inference

arXiv cs.CL ↗ · 17h ago Cached

SeKV is a resolution-adaptive KV cache method that organizes context into entropy-guided semantic spans stored across a GPU-CPU hierarchy, enabling selective token-level reconstruction during decoding while reducing GPU memory by 53.3% versus full caching at 128K context.

0 favorites 0 likes

token-level-reconstruction

SeKV: Resolution-Adaptive KV Cache with Hierarchical Semantic Memory for Long-Context LLM Inference

Submit Feedback