Tag
This paper identifies a Möbius attractor and Cascade Supervision as key mechanisms for the emergence of superposition reasoning in transformers, closing a theoretical gap on gradient descent convergence for graph reachability tasks.