looped-transformers

#looped-transformers

On the Residual Scaling of Looped Transformers: Stability and Transferability

arXiv cs.LG ↗ · 2026-06-18 Cached

This paper analyzes residual scaling in looped (weight-tied) transformers, showing that weight sharing requires stronger scaling (1/N) than standard residual networks, and derives a factored parameterization that enables hyperparameter transfer across loop counts without retuning.

0 favorites 0 likes

#looped-transformers

@askalphaxiv: Another cool research on Looped Transformers They ask the question: "Can we loop a frozen, off-the-shelf checkpoint dir…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

This research introduces a technique to loop frozen, off-the-shelf transformer checkpoints at inference time by using damped Runge-Kutta substeps, treating transformer layers as Euler steps in a residual ODE. This allows extra latent compute without fine-tuning, architecture changes, or new weights, showing gains on knowledge tasks like MMLU-Pro, GPQA, and ARC.

0 favorites 0 likes

#looped-transformers

@DimitrisPapail: The co-inventor of Looped Transformers defended her PhD thesis yesterday and is heading to an incredible new role soon …

X AI KOLs Timeline ↗ · 2026-05-08 Cached

Angeliki Giannou, co-inventor of Looped Transformers, has successfully defended her PhD thesis and is set to begin a new role. Congratulations were shared by Dimitris Papailiopoulos on social media.

0 favorites 0 likes

looped-transformers

On the Residual Scaling of Looped Transformers: Stability and Transferability

@askalphaxiv: Another cool research on Looped Transformers They ask the question: "Can we loop a frozen, off-the-shelf checkpoint dir…

@DimitrisPapail: The co-inventor of Looped Transformers defended her PhD thesis yesterday and is heading to an incredible new role soon …

Submit Feedback