Tag
Proposes ac-gpt, a simple modification to causal Transformers that enables evaluating and sampling from arbitrary conditionals (past, future, mixed) in a single forward pass while preserving left-to-right ordering and next-token prediction, allowing existing LLMs to be fine-tuned for arbitrary conditioning.
Lighthouse Attention is a training-only hierarchical selection-based attention algorithm that reduces computational complexity for long sequence training of causal transformers, enabling faster pre-training with competitive final loss after a recovery phase.