@askalphaxiv: Another cool research on Looped Transformers They ask the question: "Can we loop a frozen, off-the-shelf checkpoint dir…

X AI KOLs Timeline 05/26/26, 11:38 PM Papers

Summary

This research introduces a technique to loop frozen, off-the-shelf transformer checkpoints at inference time by using damped Runge-Kutta substeps, treating transformer layers as Euler steps in a residual ODE. This allows extra latent compute without fine-tuning, architecture changes, or new weights, showing gains on knowledge tasks like MMLU-Pro, GPQA, and ARC.

Another cool research on Looped Transformers They ask the question: "Can we loop a frozen, off-the-shelf checkpoint directly at inference time without any modifications?" So naive repetition pushes hidden states outside the distribution later layers expect, so performance drops. But if you treat transformer layers as Euler steps in a residual ODE and replaces naive loops with damped Runge–Kutta substeps, it is possible. This lets the frozen models get extra latent compute at test time with no fine-tuning, no new weights, and no architecture changes. And the best gains show up on hard knowledge MC tasks like MMLU-Pro, GPQA, and ARC.

Original Article

View Cached Full Text

Cached at: 05/27/26, 03:18 AM

Another cool research on Looped Transformers

They ask the question: “Can we loop a frozen, off-the-shelf checkpoint directly at inference time without any modifications?”

So naive repetition pushes hidden states outside the distribution later layers expect, so performance drops.

But if you treat transformer layers as Euler steps in a residual ODE and replaces naive loops with damped Runge–Kutta substeps, it is possible.

This lets the frozen models get extra latent compute at test time with no fine-tuning, no new weights, and no architecture changes.

And the best gains show up on hard knowledge MC tasks like MMLU-Pro, GPQA, and ARC.

@askalphaxiv: Another cool research on Looped Transformers They ask the question: "Can we loop a frozen, off-the-shelf checkpoint dir…

Similar Articles

Simply Stabilizing the Loop via Fully Looped Transformer

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

LoopQ: Quantization for Recursive Transformers

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

Submit Feedback

Similar Articles

Simply Stabilizing the Loop via Fully Looped Transformer

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

LoopQ: Quantization for Recursive Transformers

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models