@askalphaxiv: Another cool research on Looped Transformers They ask the question: "Can we loop a frozen, off-the-shelf checkpoint dir…
Summary
This research introduces a technique to loop frozen, off-the-shelf transformer checkpoints at inference time by using damped Runge-Kutta substeps, treating transformer layers as Euler steps in a residual ODE. This allows extra latent compute without fine-tuning, architecture changes, or new weights, showing gains on knowledge tasks like MMLU-Pro, GPQA, and ARC.
View Cached Full Text
Cached at: 05/27/26, 03:18 AM
Another cool research on Looped Transformers
They ask the question: “Can we loop a frozen, off-the-shelf checkpoint directly at inference time without any modifications?”
So naive repetition pushes hidden states outside the distribution later layers expect, so performance drops.
But if you treat transformer layers as Euler steps in a residual ODE and replaces naive loops with damped Runge–Kutta substeps, it is possible.
This lets the frozen models get extra latent compute at test time with no fine-tuning, no new weights, and no architecture changes.
And the best gains show up on hard knowledge MC tasks like MMLU-Pro, GPQA, and ARC.
Similar Articles
Simply Stabilizing the Loop via Fully Looped Transformer
This paper identifies gradient oscillation and residual explosion as causes of training instability in Looped Transformers, and proposes Fully Looped Transformer with two parameter-free modifications (Fully Looped Architecture and Attention Injection) to stabilize training up to 12 loop iterations, achieving up to 13.2% improvement in downstream performance.
Skip a Layer or Loop It? Learning Program-of-Layers in LLMs
This paper introduces PoLar, a framework that learns input-specific execution programs for frozen transformer layers, allowing layers to be skipped, kept, or repeated. It improves accuracy and reduces inference overhead compared to fixed-depth methods.
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
Proposes Memory-Efficient Looped Transformer (MELT), a novel recurrent LLM architecture that decouples reasoning depth from memory consumption by sharing a single KV cache across loops and using chunk-wise training with interpolated transition and attention-aligned distillation.
LoopQ: Quantization for Recursive Transformers
LoopQ is a post-training quantization framework for looped language models that addresses distribution shift, state reuse, and error accumulation. It achieves 68.8% average accuracy improvement under 4-bit weights and activations.
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
LoopUS is a post-training framework that converts pretrained LLMs into looped architectures for improved reasoning performance via latent-refinement and adaptive early exiting. It addresses computational costs and capability preservation issues found in existing looped computation methods.