Tag
This paper proposes a sleep-like consolidation mechanism for transformer models that uses fast weights and recurrent passes to improve long-context processing while maintaining inference speed.
FAAST proposes a forward-only method that compiles labeled examples into fast weights analytically, enabling efficient test-time supervised adaptation without backpropagation, achieving over 90% speedup and 95% memory savings while maintaining performance.