Tag
This paper investigates how training dynamics of neural networks for software defect prediction are affected by coupled data-quality issues such as class imbalance and overlap, proposing an interaction-aware empirical protocol.
This paper analyzes why machine learning, particularly neural networks, remains opaque in its learning process by framing it as a complex dynamical system, identifying three key properties that contribute to learning opacity, and arguing that some sources may be irreducible.
This book develops an effective theory for deep neural networks, showing that their predictions are nearly-Gaussian and governed by the depth-to-width ratio, and introduces representation group flow to analyze signal propagation and learning dynamics.
A paper investigating the reasons behind the success of overparameterization in neural networks, comparing the lottery ticket hypothesis with escape dimensions.
A curated list of 10 free AI learning resources including courses, newsletters, podcasts, and interactive books from experts like 3Blue1Brown, Andrej Karpathy, and Andrew Ng.
This paper introduces the apothem measure for computing trustworthy robustness certifications in neural networks, proves intractability of volume-optimal certifications, and presents the ParallelepipedoNN system achieving twofold improvement in minimum edge length on MNIST and Fashion MNIST.
Announces an arXiv note on a mathematical symmetry connecting classic MLP to Gated MLP, going beyond empirical performance.
Explores how large language models compress vast knowledge into finite space using feature superposition, explaining the distinction between dimensions and features with biological analogies.
This paper introduces 'Rosetta Neurons'—universal neurons across diverse neural networks—and shows they scale as a sublinear power law, becoming more selective and monosemantic with scale, enabling data filtering that nearly matches oracle performance.
A tweet promotes Stanford's free CS324 course on large language models, which uses a simple example of a mouse eating cheese to explain how LLMs work, and includes interactive demos.
This paper provides a mathematical analysis of superposition in neural networks, deriving upper and lower bounds on L2 reconstruction loss for simple autoencoders with power activation functions, corroborating empirical findings by Elhage et al.
This paper establishes a mathematically rigorous connection between shock-wave theory and symmetry-quotiented learning dynamics of stochastic gradient descent, showing that after symmetry reduction and coarse-graining, the dynamics satisfy viscous Hamilton-Jacobi and Burgers-type equations with shock formation times controlled by loss curvature.
MIT researchers co-authored a paper showing that general-purpose policy gradient algorithms can outperform specialized game-theoretic algorithms in imperfect-information games, challenging long-held assumptions in the field.
This paper introduces CARLOS, a deep reinforcement learning algorithm that learns continuous-time optimal stopping rules for American-style options using an aggregate deep neural network, effectively closing the Bermudan-American value gap with high computational efficiency.
An explanation of why diffusion models work well for images: low-frequency spectral components dominate, so denoising recovers coarse structure first, then fine detail — analogous to spectral autoregression.
Introduces a geometric framework to identify 'AI engrams' – memory traces in deep neural networks – formalizing neuroscientific criteria into a closed-form estimator, enabling surgical memory manipulation in models from MLPs to LLMs.
This paper demonstrates that two-layer neural networks trained with gradient-based methods can achieve the optimal computational-statistical tradeoff for learning Gaussian single-index models, matching the SQ lower bound up to polylogarithmic factors for all generative exponents and extending to sparse settings with a novel weight perturbation technique.
GRAPE is a training framework that progressively exposes parameter space during adversarial training, achieving higher robust accuracy with fewer parameters compared to fixed-structure methods on CIFAR-10.
This thread argues that standard transformers have a topological flaw: once a state representation reaches the top layer, they cannot update beliefs over time, causing collapse as depth increases.
This thread discusses the concept of 'Jagged Intelligence' in AI, framing it as a consequence of AI learning being an ill-posed inverse problem, and argues that external stabilizers like scaffolding and verification are essential.