neural-networks

#neural-networks

A Training-Time Diagnostic for Generalization via the Log-Alignment Ratio

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper introduces the log-alignment ratio (LAR), a training-time metric that measures parameter-activation alignment and predicts generalization by capturing the spread of weight and activation spectra. Experiments on grokking and a 3B-parameter language model show LAR tracks the transition from memorization to generalization and flags overfitting without held-out data.

0 favorites 0 likes

#neural-networks

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper investigates why larger models outperform smaller ones, attributing it to reduced gradient interference and better resource allocation, allowing them to learn rare and complex tasks even with infinite data. Experiments on synthetic data and OLMo models verify that larger models avoid overwriting rare-task features due to weaker gradient updates for common tasks.

0 favorites 0 likes

#neural-networks

A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper develops a PAC-Bayesian framework for physics-informed machine learning, providing high-probability generalization guarantees for unbounded losses. It proposes a multi-task perspective that jointly handles data fidelity, PDE residuals, and boundary conditions, and introduces a self-bounding learning algorithm.

0 favorites 0 likes

#neural-networks

The Hamilton-Jacobi Theory of Deep Learning

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.

0 favorites 0 likes

#neural-networks

It Takes Two Neurons to Ride a Bicycle

Hacker News Top ↗ · 2026-05-26 Cached

An annotated version of a paper showing that a simple neural network with just two neurons can control a bicycle, highlighting minimal requirements for stable locomotion.

0 favorites 0 likes

#neural-networks

Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks

arXiv cs.LG ↗ · 2026-05-26 Cached

Proposes a verification-based algorithm to compute provable bounds on exact SHAP values for neural networks, scaling to much larger search spaces than prior exact methods.

0 favorites 0 likes

#neural-networks

Feature Lottery? A Bifurcation Theory of Concept Emergence

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper introduces a bifurcation theory of representation dynamics to detect when neural networks acquire structured representations during training, using a Hessian analysis of a GMM probe. The resulting ratio β/β_c serves as a label-free phase coordinate that predicts the onset of usable structure and can forecast feature interpretability in sparse autoencoders early in training.

0 favorites 0 likes

#neural-networks

@DanKornas: Neural nets are easier to understand when you can see the math cell by cell. ai-by-hand-excel is a collection of Excel …

X AI KOLs Timeline ↗ · 2026-05-25 Cached

ai-by-hand-excel is an open-source collection of Excel workbooks that teach AI concepts like neural networks, backpropagation, and transformers by letting users inspect the math cell by cell, making model internals more intuitive.

0 favorites 0 likes

#neural-networks

A mathematical theory of balancing relational generalization and memorization

arXiv cs.LG ↗ · 2026-05-25 Cached

This paper introduces a novel task, transitive inference with exceptions, and analytically characterizes how neural network models (kernel ridge regression) balance relational generalization and memorization. The theory is validated in pretrained language models, showing systematic mistakes predicted by the theory.

0 favorites 0 likes

#neural-networks

@techwith_ram: What if I told you a neural network understands local change before it understands the full picture? That idea is deepl…

X AI KOLs Timeline ↗ · 2026-05-25 Cached

This thread explains the intuition behind the Jacobian Matrix and its widespread applications in AI and machine learning, including backpropagation, normalizing flows, computer vision, and robotics.

0 favorites 0 likes

#neural-networks

🤖 Figure AI just ran a 200-hour test where their robots sorted 250k packages

Reddit r/ArtificialInteligence ↗ · 2026-05-24

Figure AI's F.03 humanoid robots, powered by Helix-02 neural network, autonomously sorted 249,560 packages over 200 hours without hardware failure, approaching human-level efficiency.

0 favorites 0 likes

#neural-networks

@GoSailGlobal: https://x.com/GoSailGlobal/status/2058405413737857497

X AI KOLs Timeline ↗ · 2026-05-24 Cached

A Chinese article that organizes and translates 20 hand-drawn AI illustrations created by @sairahul1, covering core concepts from neural networks to agents, suitable for beginners to systematically understand the AI technology stack.

0 favorites 0 likes

#neural-networks

Hot but correct take - deterministic processes will ALWAYS beat AI/neural networks

Reddit r/ArtificialInteligence ↗ · 2026-05-23

The author argues that deterministic decision trees will always outperform neural networks, claiming that AI's successes are only due to computational limits on building such trees.

0 favorites 0 likes

#neural-networks

Position: The Time for Sampling Is Now! Charting a New Course for Bayesian Deep Learning

arXiv cs.LG ↗ · 2026-05-22 Cached

This position paper argues that sampling-based inference in Bayesian neural networks has achieved computational parity with optimization-based methods and is poised to supersede them, offering superior uncertainty quantification and prediction performance.

0 favorites 0 likes

#neural-networks

Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective

arXiv cs.LG ↗ · 2026-05-22 Cached

This paper introduces the Representation Gap, a metric for neural network generalization error with better asymptotic dynamics. Using a geometric perspective and optimal quantization theory, the authors show it is governed by the intrinsic dimension of the task, and verify this empirically on synthetic and realistic datasets.

0 favorites 0 likes

#neural-networks

Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos

arXiv cs.LG ↗ · 2026-05-22 Cached

This paper develops a mean-field theory of dropout as a perturbation at the edge of chaos in neural networks, deriving scaling laws for correlation decay and establishing distinct universality classes for smooth and ReLU-like activations. It also yields optimal dropout scheduling that reduces test loss with no extra computational cost.

0 favorites 0 likes

#neural-networks

Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model

arXiv cs.LG ↗ · 2026-05-22 Cached

This paper extends Equilibrium Propagation to skew-gradient systems and demonstrates an equivalence between deep Energy-Based Models and Hamiltonian neural networks, focusing on diffusively coupled Fitzhugh-Nagumo neurons. It derives a layer-wise Hamiltonian recurrence relation for inference in such networks.

0 favorites 0 likes

#neural-networks

High Quality Embeddings for Horn Logic Reasoning

arXiv cs.AI ↗ · 2026-05-22 Cached

This paper introduces novel methods for generating high-quality embeddings for Horn logic reasoning using triplet loss, including techniques for balanced training example generation and hard example emphasis, which improve the efficiency of downstream logical reasoning.

0 favorites 0 likes

#neural-networks

Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper proposes collocational bootstrapping, a mechanism by which statistical word co-occurrence cues can aid the acquisition of English subject-verb agreement, supported by neural network simulations and analysis of child-directed speech.

0 favorites 0 likes

#neural-networks

Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels

arXiv cs.LG ↗ · 2026-05-21 Cached

This paper studies symmetrization of loss functions for robust training under label noise, introducing SGCE and alpha-MAE loss functions that interpolate between multi-class unhinged loss and Mean Absolute Error, with theoretical guarantees and competitive empirical performance.

0 favorites 0 likes

neural-networks

Submit Feedback