Tag
This paper proves a finite-sample bound on the approximate max-information of DP-SGD that is at most linear in dataset size, yielding PAC-Bayes generalization bounds for models trained with differential privacy.
This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.
This paper presents the first quantitative prediction of the grokking delay under AdamW, deriving a closed-form law and validating it on algorithmic tasks with high accuracy.
This paper provides a unified theoretical framework for pseudo observation batch Bayesian optimization, proving that Gaussian processes produce distinct batch points and that common methods like Constant Liar and Kriging Believer are instances of a single conditioning mechanism. It introduces the Structural Diversity Diagnostic (SDD) for testing surrogate compatibility and validates predictions across multiple benchmark functions and hyperparameter tuning.
This position paper argues that machine learning research should prioritize ideas over benchmarks and theoretical guarantees, proposing an 'Ideas First' framework that values behavioral signatures and tailored experiments to promote equity and scientific understanding.
This paper proves that task-relevant latent representations can be identified from generalist models in a fully nonparametric setting without interventions or parametric constraints, achieving a hierarchical identifiability guarantee across time steps and within each step.
CTNet introduces a novel neural architecture where computation is framed as the evolution of a persistent state rather than successive rewrites, incorporating re-entrant memory, multi-scale coherence, and projective output.
OpenAI research explores how nonlinear computation can emerge in deep linear networks, presenting theoretical and empirical analysis with code examples using TensorFlow.