Tag
Researchers introduce Raven, a novel sequence model that merges state space model efficiency with a selective slot-updating mechanism inspired by sliding window attention to improve long-context retrieval. The approach offers a more principled alternative to existing linear-time models.
CTNet introduces a novel neural architecture where computation is framed as the evolution of a persistent state rather than successive rewrites, incorporating re-entrant memory, multi-scale coherence, and projective output.
OpenAI introduces the Sparse Transformer, a deep neural network that improves the attention mechanism from O(N²) to O(N√N) complexity, enabling modeling of sequences 30x longer than previously possible across text, images, and audio. The model uses sparse attention patterns and checkpoint-based memory optimization to train networks up to 128 layers deep, achieving state-of-the-art performance across multiple domains.