Tag
This paper proposes a sleep-like consolidation mechanism for transformer models that uses fast weights and recurrent passes to improve long-context processing while maintaining inference speed.