token-level-reconstruction

Tag

Cards List
#token-level-reconstruction

SeKV: Resolution-Adaptive KV Cache with Hierarchical Semantic Memory for Long-Context LLM Inference

arXiv cs.CL · 17h ago Cached

SeKV is a resolution-adaptive KV cache method that organizes context into entropy-guided semantic spans stored across a GPU-CPU hierarchy, enabling selective token-level reconstruction during decoding while reducing GPU memory by 53.3% versus full caching at 128K context.

0 favorites 0 likes
← Back to home

Submit Feedback