neural-networks

#neural-networks

Closed-form predictive coding via hierarchical Gaussian filters

arXiv cs.LG ↗ · 2026-05-21 Cached

The paper introduces closed-form predictive coding via hierarchical Gaussian filters that restore precision-weighted prediction errors, yielding faster and more efficient training without global error signals, outperforming backpropagation on certain tasks.

0 favorites 0 likes

#neural-networks

Adaptive Multi-Scale Goodness Aggregation for Forward-Forward Learning

arXiv cs.LG ↗ · 2026-05-20

Proposes Adaptive Multi-Scale Goodness Aggregation (AMSGA), an extension of the Forward-Forward algorithm that improves stability, robustness, and generalization via multi-scale goodness aggregation, adaptive hard negative mining, and layer-dependent thresholds, achieving modest accuracy gains on MNIST and Fashion-MNIST.

0 favorites 0 likes

#neural-networks

Robust Basis Spline Decoupling for the Compression of Transformer Models

arXiv cs.LG ↗ · 2026-05-20 Cached

This paper introduces a B-spline-based decoupling framework for compressing transformer models, with a robust alternating least-squares algorithm (R-CMTF-BSD) that achieves substantial parameter reduction while maintaining competitive accuracy on Vision and Swin Transformer architectures.

0 favorites 0 likes

#neural-networks

@SabrinaHalper: .@dwarkesh_sp's episode with @ericjang11 is awesome. Eric has a rare gift for making complicated ideas feel simple, whi…

X AI KOLs Timeline ↗ · 2026-05-19 Cached

Sabrina Halper recommends Dwarkesh Patel's podcast with Eric Jang, who discusses how deep learning progress has been driven more by compute than by biological inspiration.

0 favorites 0 likes

#neural-networks

Graph spectral analysis (Fiedler value + Scheffer CSD indicators) predicts grokking 21k steps before loss function - five reproducible experiments [R]

Reddit r/MachineLearning ↗ · 2026-05-19

Applies graph spectral analysis (Fiedler value) and Scheffer critical slowing down indicators to predict grokking in neural networks, detecting it 21,000 steps before the loss function changes, across five reproducible experiments.

0 favorites 0 likes

#neural-networks

SignMuon: Communication-Efficient Distributed Muon Optimization

arXiv cs.LG ↗ · 2026-05-19 Cached

SignMuon is a 1-bit, matrix-aware optimizer for distributed training that combines signSGD's majority-vote sign aggregation with Muon's polar-step framework, achieving 32x bandwidth reduction over float32 while maintaining strong convergence and performance on benchmarks like CIFAR-10/ResNet-50 and nanoGPT.

0 favorites 0 likes

#neural-networks

From Imitation to Interaction: Mastering Game of Schnapsen with Shallow Reinforcement Learning

arXiv cs.AI ↗ · 2026-05-19 Cached

This paper investigates whether shallow neural network agents can master the card game Schnapsen using reinforcement learning, outperforming a supervised imitation baseline and achieving competitive results against a strong search-based opponent.

0 favorites 0 likes

#neural-networks

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

arXiv cs.CL ↗ · 2026-05-19 Cached

This paper introduces E-PMQ, an expert-guided post-merge quantization framework that addresses the combined deviations from merging and quantization, achieving significant accuracy improvements on multi-task merged models like CLIP-ViT and FLAN-T5.

0 favorites 0 likes

#neural-networks

NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning

arXiv cs.AI ↗ · 2026-05-19 Cached

NeuroMAS treats multi-agent language systems as trainable neural-network-like architectures with LLM agents as nodes, using reinforcement learning to learn communication and specialization. It shows improved performance and that progressive growth from smaller systems works better than training large systems from scratch.

0 favorites 0 likes

#neural-networks

Birth of AI/Steve

Reddit r/ArtificialInteligence ↗ · 2026-05-18 Cached

Dr. Steven Mascall shares his personal story from neural network research in 1988 to building the AI Steve system, and developing an app for commemorating loved ones and friends, as well as a food health app, emphasizing dopamine-driven curiosity and the arrival of the era of the super individual.

0 favorites 0 likes

#neural-networks

Don't Stop Me Yet: Sampling Loss Minima via Dissipative Riemannian Mechanics

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper introduces DiMS, a dynamical system sampler that guarantees exact sampling from the submanifold of minimum loss solutions in neural networks, enabling better uncertainty quantification in Bayesian inference.

0 favorites 0 likes

#neural-networks

On the Stability of Growth in Structural Plasticity

arXiv cs.LG ↗ · 2026-05-18 Cached

This academic paper investigates the asymmetry between pruning and growth in structural plasticity for neural networks, showing that newborn units suffer from weaker gradient signals than incumbent units, and proposes interventions to improve integration.

0 favorites 0 likes

#neural-networks

From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks

arXiv cs.LG ↗ · 2026-05-18 Cached

Introduces a weight perturbation-based feature attribution method (XWP and XWPc) for fully connected neural networks, achieving competitive performance on standard baseline metrics.

0 favorites 0 likes

#neural-networks

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

Researchers introduce symmetry-compatible optimizers that respect the equivariance structures of neural network parameters, improving training stability and performance over traditional methods like Adam. The approach is validated on various language model architectures including Qwen3-0.6B, Gemma 3 1B, and OLMoE-1B-7B.

0 favorites 0 likes

#neural-networks

Bug or Feature^2: Weight Drift, Activation Sparsity, and Spikes

Hugging Face Daily Papers ↗ · 2026-05-17 Cached

This paper formally proves that training neural networks with asymmetric activation functions like ReLU, GELU, or SiLU causes weights to drift negative, leading to up to 90% activation sparsity. It also shows that squared activations like ReLU² improve performance but cause activation spikes, which can be fixed by clipping, with GELU² achieving the best validation loss.

0 favorites 0 likes

#neural-networks

@0xCodez: SNAPCHAT PAID $150,000,000 FOR LOOKSERY - A STARTUP IN DEEP LEARNING COMPUTER VISION. This 1-hour MIT lecture on "Build…

X AI KOLs Timeline ↗ · 2026-05-16 Cached

Snapchat paid $150 million for Looksery, a deep learning computer vision startup. A free MIT lecture teaches building neural networks from scratch.

0 favorites 0 likes

#neural-networks

@francoisfleuret: Awesome. Seriously, people are harsh with this platform, but if you are careful with whom you follow, it is a constant …

X AI KOLs Timeline ↗ · 2026-05-16 Cached

Eric Jang announces he has been working on a from-scratch implementation of AlphaGo, the 2016 AI breakthrough that inspired him to enter deep learning.

0 favorites 0 likes

#neural-networks

@Propriocetive: A week ago, a $50M+ interpretability lab published the exact research thesis I've been quietly building from my apartme…

X AI KOLs Following ↗ · 2026-05-15

An independent researcher recounts discovering that a $50M+ lab's paper on manifold steering converges with his own patented and published work on universal behavioral manifolds, highlighting the significance of independent scientific convergence.

0 favorites 0 likes

#neural-networks

TILT: Target-induced loss tilting under covariate shift

arXiv cs.LG ↗ · 2026-05-15 Cached

TILT introduces a novel objective for unsupervised domain adaptation under covariate shift that penalizes an auxiliary component on unlabeled target data, implicitly achieving self-localized importance weighting with bounded estimands. Theoretical guarantees and experiments on shifted CIFAR-100 show improved target performance over baselines.

0 favorites 0 likes

#neural-networks

@GoodfireAI: Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used f…

X AI KOLs Following ↗ · 2026-05-14 Cached

GoodfireAI found that neural networks perform math by rotating shapes, uncovering a shape-rotating calculator inside an LLM that is used for more than just math.

0 favorites 0 likes

neural-networks

Submit Feedback