Tag
This paper shows that discrete Gradient Descent with large step sizes restores symmetry in multi-pathway Deep Linear Networks, countering the symmetry-breaking predicted by Gradient Flow, and leads to signal re-balancing across pathways. The authors theoretically prove that balanced solutions are flatter (less sharp) than sparse ones, and large learning rates drive the network toward stable, balanced configurations.
The paper introduces Diamond Attention, a method for multi-agent reinforcement learning that uses structured randomness to break symmetry and enable role differentiation among homogeneous agents, achieving perfect coordination in symmetric tasks like the XOR game.