Tag
This paper proposes a training-inference consistent segmented execution framework for long-context LLMs to address the mismatch between full-context training and restricted inference regimes, achieving comparable performance with significantly reduced memory usage.