Tag
This paper introduces Multi-Adapter PPO, a reinforcement learning framework with cross-attention for wavelength selection in LIBS quantitative analysis, achieving 28.4% better composite scores and 45.2% improvement in prediction accuracy over traditional methods on steel and coal datasets.
An explanation of why diffusion models work well for images: low-frequency spectral components dominate, so denoising recovers coarse structure first, then fine detail — analogous to spectral autoregression.
This paper introduces the Multi-Scale Feature Attention Network (MSFAN), a deep learning architecture for classifying 12 types of polymers using THz Dual-Comb Spectroscopy, achieving 85.2% accuracy and outperforming state-of-the-art models.
Resonate is a low-latency, low-memory algorithm for perceptually relevant spectral analysis of audio signals, using resonator models with exponentially weighted moving averages.
This paper presents an exact decomposition of the curvature exponent α in neural network loss landscapes, explaining why it varies across layer types. It introduces the spectral alignment decomposition and derives a spectral transfer identity linking curvature, gradient rank decay, and Hessian exponents, validated across architectures and datasets.
BitsMoE introduces a spectral-energy-guided bit allocation framework for quantizing Mixture-of-Experts LLMs, achieving substantial accuracy improvements and speedups under ultra-low-bit quantization.
This paper identifies a spectral phenomenon called Stability of Singular Distribution (SoSD) in large language model pre-training, where the singular value spectrum stabilizes early while parameters continue to evolve. The authors prove that this stabilization marks the transition to the slow-descent phase of training, and they analyze how training strategies like WSD and Muon affect this behavior.
Introduces a three-step recipe for identifying attention-head circuits in pretrained transformers using a spectral signal and task-pattern screen without requiring labels, validated across 51M to 1B parameter models and multiple architectures.
Applies graph spectral analysis (Fiedler value) and Scheffer critical slowing down indicators to predict grokking in neural networks, detecting it 21,000 steps before the loss function changes, across five reproducible experiments.
This paper proposes Spectra, a method using spectral occupancy to analyze and control the realized capacity of latent graph models, arguing that rank is not equivalent to model capacity.
The article presents a discovered spectral ratio between MLP and attention norms that predicts geometric stability in transformer models, with an optimal range of 0.5–2 to prevent rank collapse.
This paper proposes distributional spectral diagnostics to localize grokking transitions in Transformer models before test accuracy rises. It uses empirical distributions and Hankel dynamic mode decomposition to create a monitoring signal that discriminates between grokking and non-grokking runs.
A Python library for calculating ephemerides and spectral data, hosted on PyPI.
A comprehensive spectral analysis across 11 LLMs revealing that transformers exhibit phase transitions in hidden activation spaces during reasoning versus factual recall, with seven fundamental phenomena including spectral compression, instruction-tuning reversal, and perfect correctness prediction (AUC=1.0) based solely on spectral properties.