Tag
Introduces a symmetry-compatible principle for LLM optimizer design, yielding a layerwise optimizer stack with principled updates for embeddings, LM heads, SwiGLU MLPs, and MoE routers, showing improved validation loss over AdamW across multiple architectures.