kv-cache-compression

Tag

Cards List
#kv-cache-compression

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

arXiv cs.CL · 2026-04-20 Cached

OjaKV introduces a context-aware online low-rank KV cache compression framework that uses hybrid storage and Oja's algorithm for incremental subspace adaptation to reduce GPU memory bottlenecks in long-context LLM inference without model fine-tuning.

0 favorites 0 likes
← Back to home

Submit Feedback