long-sequence

#long-sequence

Generative modeling with sparse transformers

OpenAI Blog ↗ · 2019-04-23 Cached

OpenAI introduces the Sparse Transformer, a deep neural network that improves the attention mechanism from O(N²) to O(N√N) complexity, enabling modeling of sequences 30x longer than previously possible across text, images, and audio. The model uses sparse attention patterns and checkpoint-based memory optimization to train networks up to 128 layers deep, achieving state-of-the-art performance across multiple domains.

0 favorites 0 likes

long-sequence

Generative modeling with sparse transformers

Submit Feedback