chunk-wise-training

#chunk-wise-training

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

Proposes Memory-Efficient Looped Transformer (MELT), a novel recurrent LLM architecture that decouples reasoning depth from memory consumption by sharing a single KV cache across loops and using chunk-wise training with interpolated transition and attention-aligned distillation.

0 favorites 0 likes

chunk-wise-training

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Submit Feedback