long-context-llms

#long-context-llms

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

arXiv cs.CL ↗ · 2026-04-20 Cached

OjaKV introduces a context-aware online low-rank KV cache compression framework that uses hybrid storage and Oja's algorithm for incremental subspace adaptation to reduce GPU memory bottlenecks in long-context LLM inference without model fine-tuning.

0 favorites 0 likes

long-context-llms

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

Submit Feedback