Tag
The paper introduces Mela, a memory-augmented transformer architecture inspired by human memory consolidation, featuring a Hierarchical Memory Module that improves long-context language modeling performance.