theory

#theory

From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper proves a finite-sample bound on the approximate max-information of DP-SGD that is at most linear in dataset size, yielding PAC-Bayes generalization bounds for models trained with differential privacy.

0 favorites 0 likes

#theory

The Hamilton-Jacobi Theory of Deep Learning

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.

0 favorites 0 likes

#theory

First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation

arXiv cs.LG ↗ · 2026-05-20

This paper presents the first quantitative prediction of the grokking delay under AdamW, deriving a closed-form law and validating it on algorithmic tasks with high accuracy.

0 favorites 0 likes

#theory

Efficient Conditioning Why Pseudo Observation Batch Bayesian Optimization Works When It Does not

arXiv cs.LG ↗ · 2026-05-20

This paper provides a unified theoretical framework for pseudo observation batch Bayesian optimization, proving that Gaussian processes produce distinct batch points and that common methods like Constant Liar and Kriging Believer are instances of a single conditioning mechanism. It introduces the Structural Diversity Diagnostic (SDD) for testing surrogate compatibility and validates predictions across multiple benchmark functions and hyperparameter tuning.

0 favorites 0 likes

#theory

Position: Ideas Should be the Center of Machine Learning Research

arXiv cs.LG ↗ · 2026-05-18 Cached

This position paper argues that machine learning research should prioritize ideas over benchmarks and theoretical guarantees, proposing an 'Ideas First' framework that values behavioral signatures and tailored experiments to promote equity and scientific understanding.

0 favorites 0 likes

#theory

From Generalist to Specialist Representation

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper proves that task-relevant latent representations can be identified from generalist models in a fully nonparametric setting without interventions or parametric constraints, achieving a hierarchical identifiability guarantee across time steps and within each step.

0 favorites 0 likes

#theory

He presentado CTNet: una arquitectura donde el cómputo ocurre como evolución de un estado persistente [D]

Reddit r/MachineLearning ↗ · 2026-04-23

CTNet introduces a novel neural architecture where computation is framed as the evolution of a persistent state rather than successive rewrites, incorporating re-entrant memory, multi-scale coherence, and projective output.

0 favorites 0 likes

#theory

Nonlinear computation in deep linear networks

OpenAI Blog ↗ · 2017-09-29 Cached

OpenAI research explores how nonlinear computation can emerge in deep linear networks, presenting theoretical and empirical analysis with code examples using TensorFlow.

0 favorites 0 likes

theory

Submit Feedback