hierarchical-attention

#hierarchical-attention

@NousResearch: Today we release Lighthouse Attention, a selection-based hierarchical attention for long-context pre-training that deli…

X AI KOLs Following ↗ · 2026-05-15

NousResearch releases Lighthouse Attention, a selection-based hierarchical attention that achieves 1.4-1.7x wall-clock speedup at 98K context and ~17x faster forward/backward pass than standard attention at 512K context on a single B200, validated on 530M-parameter Llama-3 models across 50B tokens.

0 favorites 0 likes

#hierarchical-attention

Long Context Pre-Training with Lighthouse Attention

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

Lighthouse Attention is a training-only hierarchical selection-based attention algorithm that reduces computational complexity for long sequence training of causal transformers, enabling faster pre-training with competitive final loss after a recovery phase.

0 favorites 0 likes

hierarchical-attention

@NousResearch: Today we release Lighthouse Attention, a selection-based hierarchical attention for long-context pre-training that deli…

Long Context Pre-Training with Lighthouse Attention

Submit Feedback