Tag
The paper introduces Ember, a lightweight optimizer for embedding and LM-head matrices that exploits gradient geometry to improve efficiency and performance across supervised finetuning, RL, and pretraining, while using far less optimizer state than Adam.
This paper presents a descent-free and alignment-free method to measure singular structure in trained neural networks. It recovers the order of dead directions from the directional Fisher rate, classifying genuine singularities from flat gauge symmetries, and demonstrates the technique on transformer and convolutional layers.
This paper introduces the degeneracy distillery, a method that automatically detects and resolves degenerate parameter combinations in physical models by estimating and flattening the Fisher information matrix, reducing the simulation budget required for neural posterior estimation while providing physical insight.
Introduces Fisher width, a Riemannian analogue of Gaussian width for statistical manifolds, which captures local statistical curvature and is invariant under reparameterization. The paper develops its theory, proves generalization bounds for Fisher-Lipschitz classes, and demonstrates computable estimators on MNIST.
The paper proposes an attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix, providing theoretical bounds and scalable evaluation methods for deep neural networks.
FoRA introduces a parameter-efficient fine-tuning method that selects task-informative layers via Fisher scores and trains LoRA down-projections on the Stiefel manifold, reducing parameters while preserving accuracy.
A developer built Arc Gate, a monitoring proxy for LLMs that uses Fisher information manifold geometry to detect session-level prompt injection attacks, identifying Crescendo-style gradual manipulation by tracking t-values against a phase transition threshold t* = 1.2247 rather than per-turn phrase detection.