efficient-attention

#efficient-attention

Training-Inference Consistent Segmented Execution for Long-Context LLMs

arXiv cs.CL ↗ · 12h ago Cached

This paper proposes a training-inference consistent segmented execution framework for long-context LLMs to address the mismatch between full-context training and restricted inference regimes, achieving comparable performance with significantly reduced memory usage.

0 favorites 0 likes

#efficient-attention

@omarsar0: Cool idea from Nous Research. What if you could speed up long-context pretraining with a subquadratic wrapper that you …

X AI KOLs Following ↗ · yesterday Cached

Nous Research introduces Lighthouse Attention, a training-only subquadratic wrapper for scaled dot-product attention that accelerates long-context pretraining and can be removed before deployment to preserve vanilla inference efficiency.

0 favorites 0 likes

#efficient-attention

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces Toeplitz MLP Mixers (TMM), a novel architecture that replaces attention with Toeplitz matrix multiplication to achieve lower computational complexity while maintaining high information retention and training efficiency.

0 favorites 0 likes

efficient-attention

Training-Inference Consistent Segmented Execution for Long-Context LLMs

@omarsar0: Cool idea from Nous Research. What if you could speed up long-context pretraining with a subquadratic wrapper that you …

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

Submit Feedback