layerwise-optimizer

Tag

Cards List
#layerwise-optimizer

@timlautk: 1/4 New paper with @weijie444! We introduce a symmetry-compatible principle for LLM optimizer design and, as a byproduc…

X AI KOLs Following · 2026-05-19 Cached

Introduces a symmetry-compatible principle for LLM optimizer design, yielding a layerwise optimizer stack with principled updates for embeddings, LM heads, SwiGLU MLPs, and MoE routers, showing improved validation loss over AdamW across multiple architectures.

0 favorites 0 likes
← Back to home

Submit Feedback