Tag
This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.
This paper introduces FragileFlow, a plug-in regularizer that improves the robustness of LLMs and VLMs by controlling 'correct-but-fragile' predictions through spectral analysis and PAC-Bayes bounds.
This paper introduces a theoretical framework for geometric factual recall in transformers, demonstrating that embeddings can encode relational structure via linear superpositions while MLPs act as selectors. It provides empirical and theoretical evidence that this mechanism allows for efficient memorization of facts and multi-hop queries.
This paper addresses an open problem in reinforcement learning by providing a counterexample showing that differential temporal difference learning can diverge when using a global clock, despite converging with a local clock, in average-reward settings.
This paper presents a finite-iteration theory for asynchronous categorical distributional temporal-difference learning, bridging the gap between existing theoretical frameworks and practical online implementations.
This paper introduces a 'rod flow' model for Adam and other adaptive optimizers to better analyze their behavior at the edge of stability. It extends continuous-time modeling to momentum methods, showing improved accuracy in tracking discrete iterates compared to stable flow models.
This academic paper develops a theoretical framework for online learning with autoregressive chain-of-thought reasoning, analyzing mistake bounds under end-to-end and trajectory supervision models.
This paper derives a closed-form upper bound for admissible learning-rate steps in belief-space dynamics using KL divergence and Bregman geometry, focusing on cross-entropy classification.
This post highlights the Johnson–Lindenstrauss Lemma, explaining its importance for ML engineers in understanding dimensionality reduction, random projections, and embedding efficiency.
This paper introduces a hybrid Track-and-Stop algorithm for best arm identification in generalized linear bandits that unifies absolute and relative feedback. The authors propose a likelihood-ratio-based confidence sequence to adaptively allocate queries, demonstrating improved sample efficiency over baseline methods.
This article analyzes a recent research paper that provides a taxonomical framework for imitation learning algorithms, categorizing them by moment matching techniques and analyzing their theoretical imitation gap bounds.