neural-networks

#neural-networks

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper demonstrates that two-layer neural networks trained with gradient-based methods can achieve the optimal computational-statistical tradeoff for learning Gaussian single-index models, matching the SQ lower bound up to polylogarithmic factors for all generative exponents and extending to sparse settings with a novel weight perturbation technique.

0 favorites 0 likes

#neural-networks

GRAPE: Guided Parameter-Space Evolution for Compact Adversarial Robustness

arXiv cs.LG ↗ · 2026-06-16 Cached

GRAPE is a training framework that progressively exposes parameter space during adversarial training, achieving higher robust accuracy with fewer parameters compared to fixed-structure methods on CIFAR-10.

0 favorites 0 likes

#neural-networks

@che_shr_cat: 1/ Standard transformers have a fundamental topological flaw: they cannot track dynamic states over time without runnin…

X AI KOLs Timeline ↗ · 2026-06-15 Cached

This thread argues that standard transformers have a topological flaw: once a state representation reaches the top layer, they cannot update beliefs over time, causing collapse as depth increases.

0 favorites 0 likes

#neural-networks

@BetaTomorrow: https://x.com/BetaTomorrow/status/2066435380623385000

X AI KOLs Timeline ↗ · 2026-06-15 Cached

This thread discusses the concept of 'Jagged Intelligence' in AI, framing it as a consequence of AI learning being an ill-posed inverse problem, and argues that external stabilizers like scaffolding and verification are essential.

0 favorites 0 likes

#neural-networks

Implicit Variational Rejection Sampling

arXiv cs.LG ↗ · 2026-06-15 Cached

The article proposes Implicit Variational Rejection Sampling (IVRS), which integrates implicit distributions with rejection sampling to improve posterior approximation in variational inference, and introduces the Implicit Resampling Evidence Lower Bound (IR-ELBO) as a tighter variational lower bound.

0 favorites 0 likes

#neural-networks

Neural Slack Variables for Shape Constraints

arXiv cs.LG ↗ · 2026-06-15 Cached

This paper introduces neural slack variables, a primal-side approach that converts constraint enforcement into a regression problem by coupling the primary network with a jointly learned auxiliary network, achieving zero violations on monotonicity and convexity tests and enabling arbitrage-free learning of volatility surfaces.

0 favorites 0 likes

#neural-networks

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

arXiv cs.LG ↗ · 2026-06-15 Cached

This paper demonstrates that the weight norm causally controls the timescale of grokking in neural networks, reconciling conflicting accounts. Through interventions, it shows that grokking follows an exponential delay law and that norm magnitude dominates grokking time over learning rate across architectures.

0 favorites 0 likes

#neural-networks

Fodor and Pylyshyn's Systematicity Challenge Still Stands

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper argues that recent claims that neural networks have solved Fodor and Pylyshyn's systematicity challenge are premature. The authors show that the meta-learning for compositionality model fails to generalize out-of-distribution and behaves unsystematically even on in-distribution problems, concluding the challenge remains unmet.

0 favorites 0 likes

#neural-networks

@che_shr_cat: 1/ What if you could train a model on totally benign-looking Wikipedia articles, but secretly force its internal weight…

X AI KOLs Following ↗ · 2026-06-14 Cached

This thread presents a technique to encode a functional QR code into neural network weights using natural language text during training, enabling hidden information embedding in models trained on benign data.

0 favorites 0 likes

#neural-networks

Singular Learning Theory: AI learns like ice melts

Reddit r/artificial ↗ · 2026-06-12 Cached

Singular Learning Theory (SLT) uses algebraic geometry to explain why neural networks generalize well despite their degeneracies, introducing the real log canonical threshold (RLCT) as a measure of model complexity.

0 favorites 0 likes

#neural-networks

@PandaTalk8: 1/ Recently read a book that is perfect for systematically learning LLM basics: 《Foundations of Large Language Models》 by Tong Xiao and Jingbo Zhu, from China's Northeastern University NLP Lab and NiuTrans…

X AI KOLs Timeline ↗ · 2026-06-12 Cached

Recommend a book for systematically learning the basics of large language models: 《Foundations of Large Language Models》, written by Tong Xiao and Jingbo Zhu from Northeastern University NLP Lab and NiuTrans Research.

0 favorites 0 likes

#neural-networks

Reducing the Complexity of Deep Learning Models for EEG Analysis on Wearable Devices

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper investigates reducing the computational complexity of deep neural networks for EEG analysis on wearable devices by applying parameter quantization and electrode reduction techniques, demonstrating significant complexity reduction with minimal accuracy loss for epileptic seizure detection.

0 favorites 0 likes

#neural-networks

@hayatasuuu: Tokyo Institute of Technology's machine learning course materials are being made publicly available completely for free…

X AI KOLs Following ↗ · 2026-06-11 Cached

Tokyo Institute of Technology has released free machine learning course materials covering topics like regression, neural networks, SVM, clustering, and PCA, with hands-on code using NumPy, scikit-learn, and PyTorch.

0 favorites 0 likes

#neural-networks

Learning from almost nothing: How neural networks survive heavy input corruption

arXiv cs.LG ↗ · 2026-06-11 Cached

This paper investigates how neural networks maintain high accuracy even when over 90% of input features are corrupted, deriving a centroid-based decision rule in the high-noise limit using a mean-field approach.

0 favorites 0 likes

#neural-networks

Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems

arXiv cs.LG ↗ · 2026-06-11 Cached

This paper introduces MF-Net, a recurrent dynamical model that represents multivariate systems through a shared field state and learns a mechanical transition for joint evolution. It achieves competitive forecasting while enabling interpretable structural readout of learned relations.

0 favorites 0 likes

#neural-networks

Deficient executive control in transformer attention

Hacker News Top ↗ · 2026-06-10

The article discusses a deficiency in executive control within transformer attention mechanisms, highlighting limitations in how transformers manage sequential dependencies.

0 favorites 0 likes

#neural-networks

@kmeanskaran: Best way to balance both ML and AI today is: > Python (specially Pydantic) > Neural Networks fundamentals > RNN, LSTM, …

X AI KOLs Timeline ↗ · 2026-06-10 Cached

A tweet by Karan (@kmeanskaran) outlining a learning roadmap for balancing ML and AI, covering Python, neural networks, NLP, LLMs, deployment, and agentic AI, with a reply from Amit seeking beginner guidance.

0 favorites 0 likes

#neural-networks

Emergence via Phase Transitions: Mechanism Landscapes and Universal Convergence Across Complex Systems

arXiv cs.LG ↗ · 2026-06-09 Cached

This paper introduces the Hierarchical Emergence Framework (HEF), which explains how diverse systems such as neural networks and biological evolution converge to similar internal representations through phase transitions in mechanism landscapes under physical and informational constraints. The framework is validated empirically with 111 grokking experiments that confirm universal convergence and identify a critical energy threshold.

0 favorites 0 likes

#neural-networks

Flatland: The Adventures of Gradient Descent with Large Step Sizes

arXiv cs.LG ↗ · 2026-06-08 Cached

This paper addresses the open question of maximum step size for gradient descent convergence on non-L-smooth objectives, introducing adaptive methods that operate at the edge of stability and can minimize sharpness globally.

0 favorites 0 likes

#neural-networks

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper analyzes precision loss in FP8 attention due to the attention sink phenomenon when casting the softmax output to FP8 (E4M3). It shows that forward KV iteration causes underflow of non-sink attention values, and proposes reverse iteration and a static scaling factor S=256 to eliminate underflow, achieving 3-10x MSE improvement.

0 favorites 0 likes

neural-networks

Submit Feedback