long-context-llms

#long-context-llms

Training-Inference Consistent Segmented Execution for Long-Context LLMs

arXiv cs.CL ↗ · 17h ago Cached

This paper proposes a training-inference consistent segmented execution framework for long-context LLMs to address the mismatch between full-context training and restricted inference regimes, achieving comparable performance with significantly reduced memory usage.

0 favorites 0 likes

#long-context-llms

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

arXiv cs.CL ↗ · 2026-04-20 Cached

OjaKV introduces a context-aware online low-rank KV cache compression framework that uses hybrid storage and Oja's algorithm for incremental subspace adaptation to reduce GPU memory bottlenecks in long-context LLM inference without model fine-tuning.

0 favorites 0 likes

long-context-llms

Training-Inference Consistent Segmented Execution for Long-Context LLMs

OjaKV: Context-Aware Online Low-Rank KV Cache Compression

Submit Feedback