Tag
This blog post explains how to derive the Singular Value Decomposition (SVD) from scratch by focusing on the underlying intuition and the motivation behind the concept, arguing that traditional math books often present formalized conclusions without showing the exploratory path.
This paper identifies a spectral phenomenon called Stability of Singular Distribution (SoSD) in large language model pre-training, where the singular value spectrum stabilizes early while parameters continue to evolve. The authors prove that this stabilization marks the transition to the slow-descent phase of training, and they analyze how training strategies like WSD and Muon affect this behavior.
This paper introduces Rotation-Preserving Supervised Fine-Tuning (RPSFT), a method that improves out-of-domain generalization by preserving projected rotations in pretrained singular subspaces during fine-tuning.