chunk-wise-training

Tag

Cards List
#chunk-wise-training

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Hugging Face Daily Papers · 2026-05-08 Cached

Proposes Memory-Efficient Looped Transformer (MELT), a novel recurrent LLM architecture that decouples reasoning depth from memory consumption by sharing a single KV cache across loops and using chunk-wise training with interpolated transition and attention-aligned distillation.

0 favorites 0 likes
← Back to home

Submit Feedback