neural-networks

#neural-networks

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper analyzes precision loss in FP8 attention due to the attention sink phenomenon when casting the softmax output to FP8 (E4M3). It shows that forward KV iteration causes underflow of non-sink attention values, and proposes reverse iteration and a static scaling factor S=256 to eliminate underflow, achieving 3-10x MSE improvement.

0 favorites 0 likes

#neural-networks

@jakevin7: Everyone is talking about AI now, but few know that the founder of this field was once dismissed as a madman by the world. Geoffrey Hinton won the Nobel Prize in Physics in 2024. A reporter asked him: How many years did you wait? He said: About forty. In 1969, a book killed neural networks...

X AI KOLs Following ↗ · 2026-06-08 Cached

This article recounts how Geoffrey Hinton persisted in his research for three decades during the AI winter, when neural networks were abandoned by academia. He eventually gained fame with AlexNet in the 2012 ImageNet competition and won the Nobel Prize in Physics in 2024.

0 favorites 0 likes

#neural-networks

@zhaisf: These were some magical results from distillation by @geoffreyhinton that really shocked me when I first saw them, and …

X AI KOLs Following ↗ · 2026-06-07 Cached

The article discusses surprising robustness of model distillation with respect to training distribution, even with little overlap with target distribution, and its implications for on/off-policy distillation.

0 favorites 0 likes

#neural-networks

@incrementaliser: Just finished watching a gem by @ChrisGPotts , "Finding linguistic structure in large language models", and I'm now pro…

X AI KOLs Following ↗ · 2026-06-06

A tweet highlights Chris Potts' talk on how large language models learn linguistic structures, reinforcing the view that LLMs capture syntax and semantics.

0 favorites 0 likes

#neural-networks

Transformers Are Inherently Succinct

Hacker News Top ↗ · 2026-06-05 Cached

This paper argues that transformer architectures are inherently succinct, meaning they can represent certain functions more efficiently than other models. It presents theoretical analysis and proofs.

0 favorites 0 likes

#neural-networks

Playing with Vision Embeddings

Hacker News Top ↗ · 2026-06-05 Cached

This post explores DINOv3 vision embeddings by generating images that correspond to specific embedding directions, using gradient optimization and augmentation strategies to invert the model.

0 favorites 0 likes

#neural-networks

Derivative Informed Learning of Exchange-Correlation Functionals

arXiv cs.LG ↗ · 2026-06-04 Cached

This ICML 2026 paper introduces Derivative Informed XC-Loss (DI-Loss), a training approach for machine-learned exchange-correlation functionals that incorporates first and second derivative supervision on the Grassmannian of density matrices. Across four architectures, DI-Loss reduces total-energy MAE by 66% compared to energy and density supervision alone, and improves excited-state predictions in TDDFT calculations.

0 favorites 0 likes

#neural-networks

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

arXiv cs.LG ↗ · 2026-06-04 Cached

This paper presents a theoretical framework for deep reinforcement learning in continuous environments, modeling it as a continuous-time stochastic process using stochastic control theory. The authors characterize an actor-critic algorithm's dynamics in the infinite width limit of two-layer networks, deriving an equation for infinitesimal changes in state distribution under a vanishingly small learning rate.

0 favorites 0 likes

#neural-networks

AI from concrete to abstract: demystifying artificial intelligence to the general public

arXiv cs.AI ↗ · 2026-06-04 Cached

This paper presents AIcon2abs, a methodology combining visual programming and WiSARD weightless neural networks to help general audiences, including children, understand AI concepts through hands-on learning activities. The approach integrates training and classification as first-class programming constructs to make the distinction between learning machines and conventional programs more intuitive.

0 favorites 0 likes

#neural-networks

"They're made out of weights"

Hacker News Top ↗ · 2026-06-03 Cached

A creative dialogue explores the idea that large language models are fundamentally just matrices of weights, challenging notions of understanding and sentience.

0 favorites 0 likes

#neural-networks

Curatube: a distraction free interface for YT playlists to focus on learning

Lobsters Hottest ↗ · 2026-06-03 Cached

Curatube is a distraction-free interface for YouTube playlists, designed to help focus on learning. It currently features the Neural Networks: Zero to Hero course by Andrej Karpathy.

0 favorites 0 likes

#neural-networks

Neural Networks Provably Learn Spectral Representations for Group Composition

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper theoretically demonstrates that two-layer neural networks trained on group composition tasks learn spectral representations, with neurons converging to irreducible representations and achieving rotational rank-one alignment, providing a representation-theoretic account of feature learning.

0 favorites 0 likes

#neural-networks

Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper presents an exact decomposition of the curvature exponent α in neural network loss landscapes, explaining why it varies across layer types. It introduces the spectral alignment decomposition and derives a spectral transfer identity linking curvature, gradient rank decay, and Hessian exponents, validated across architectures and datasets.

0 favorites 0 likes

#neural-networks

Neural Networks Provably Learn Spectral Representations for Group Composition

Hugging Face Daily Papers ↗ · 2026-06-02

This paper provides a theoretical analysis of how neural networks learn structured representations during group composition tasks, proving that training dynamics drive neurons to converge to irreducible group representations with exponential convergence rates. The work establishes a representation-theoretic account of feature learning and characterizes a low-rank compression phenomenon for matrix-valued group representations.

0 favorites 0 likes

#neural-networks

@ChrisGPotts: We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Hua…

X AI KOLs Following ↗ · 2026-06-01 Cached

This paper investigates why larger models outperform smaller ones, attributing it to data-induced competition for neural resources through formal analysis and experiments.

0 favorites 0 likes

#neural-networks

@antoniolupetti: "Computing Neural Network Gradients" is a clear introduction to the mathematics behind backpropagation and gradient com…

X AI KOLs Timeline ↗ · 2026-06-01 Cached

Stanford CS224N course notes provide a clear introduction to the mathematics of backpropagation and gradient computation in neural networks, covering chain rule, computational graphs, and vectorized derivatives.

0 favorites 0 likes

#neural-networks

Benchmarking Machine Learning Uncertainty Quantification Methodologies for Predicting Turbine Gas Temperature Degradation

arXiv cs.LG ↗ · 2026-06-01 Cached

This paper benchmarks five uncertainty quantification methods for neural network predictions of turbine gas temperature, evaluating trade-offs in coverage, width, and stability to guide prognostics and health management in engines.

0 favorites 0 likes

#neural-networks

Bit-Mass Theory – The Container Principle

Reddit r/artificial ↗ · 2026-05-31

The Bit-Mass Theory proposes that the total number of weight bits determines model accuracy, not the computation format, with experiments on MNIST showing equivalent performance between binary and floating-point networks at the same bit-mass.

0 favorites 0 likes

#neural-networks

The Hamilton-Jacobi Theory of Deep Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper establishes an exact correspondence between neural network training and Hamilton-Jacobi initial-value problems, unifying deep learning architectures through a deformation parameter.

0 favorites 0 likes

#neural-networks

A Training-Time Diagnostic for Generalization via the Log-Alignment Ratio

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper introduces the log-alignment ratio (LAR), a training-time metric that measures parameter-activation alignment and predicts generalization by capturing the spread of weight and activation spectra. Experiments on grokking and a 3B-parameter language model show LAR tracks the transition from memorization to generalization and flags overfitting without held-out data.

0 favorites 0 likes

neural-networks

Submit Feedback