Tag
The article explains that attention entropy collapse in deep transformer layers is a geometric consequence of training, not a bug, and proposes a three-line temperature schedule to prevent it.
This paper introduces Geometric Kolmogorov-Arnold Networks (GeoKAN), a family of geometry-aware models that learn Riemannian metrics to adapt coordinates for improved function approximation and physics-informed learning.