Tag
This paper proposes a training-inference consistent segmented execution framework for long-context LLMs to address the mismatch between full-context training and restricted inference regimes, achieving comparable performance with significantly reduced memory usage.
Nous Research introduces Lighthouse Attention, a training-only subquadratic wrapper for scaled dot-product attention that accelerates long-context pretraining and can be removed before deployment to preserve vanilla inference efficiency.
This paper introduces Toeplitz MLP Mixers (TMM), a novel architecture that replaces attention with Toeplitz matrix multiplication to achieve lower computational complexity while maintaining high information retention and training efficiency.