Tag
The paper introduces closed-form predictive coding via hierarchical Gaussian filters that restore precision-weighted prediction errors, yielding faster and more efficient training without global error signals, outperforming backpropagation on certain tasks.
Proposes Adaptive Multi-Scale Goodness Aggregation (AMSGA), an extension of the Forward-Forward algorithm that improves stability, robustness, and generalization via multi-scale goodness aggregation, adaptive hard negative mining, and layer-dependent thresholds, achieving modest accuracy gains on MNIST and Fashion-MNIST.
This paper introduces a B-spline-based decoupling framework for compressing transformer models, with a robust alternating least-squares algorithm (R-CMTF-BSD) that achieves substantial parameter reduction while maintaining competitive accuracy on Vision and Swin Transformer architectures.
Sabrina Halper recommends Dwarkesh Patel's podcast with Eric Jang, who discusses how deep learning progress has been driven more by compute than by biological inspiration.
Applies graph spectral analysis (Fiedler value) and Scheffer critical slowing down indicators to predict grokking in neural networks, detecting it 21,000 steps before the loss function changes, across five reproducible experiments.
SignMuon is a 1-bit, matrix-aware optimizer for distributed training that combines signSGD's majority-vote sign aggregation with Muon's polar-step framework, achieving 32x bandwidth reduction over float32 while maintaining strong convergence and performance on benchmarks like CIFAR-10/ResNet-50 and nanoGPT.
This paper investigates whether shallow neural network agents can master the card game Schnapsen using reinforcement learning, outperforming a supervised imitation baseline and achieving competitive results against a strong search-based opponent.
This paper introduces E-PMQ, an expert-guided post-merge quantization framework that addresses the combined deviations from merging and quantization, achieving significant accuracy improvements on multi-task merged models like CLIP-ViT and FLAN-T5.
NeuroMAS treats multi-agent language systems as trainable neural-network-like architectures with LLM agents as nodes, using reinforcement learning to learn communication and specialization. It shows improved performance and that progressive growth from smaller systems works better than training large systems from scratch.
Dr. Steven Mascall shares his personal story from neural network research in 1988 to building the AI Steve system, and developing an app for commemorating loved ones and friends, as well as a food health app, emphasizing dopamine-driven curiosity and the arrival of the era of the super individual.
This paper introduces DiMS, a dynamical system sampler that guarantees exact sampling from the submanifold of minimum loss solutions in neural networks, enabling better uncertainty quantification in Bayesian inference.
This academic paper investigates the asymmetry between pruning and growth in structural plasticity for neural networks, showing that newborn units suffer from weaker gradient signals than incumbent units, and proposes interventions to improve integration.
Introduces a weight perturbation-based feature attribution method (XWP and XWPc) for fully connected neural networks, achieving competitive performance on standard baseline metrics.
Researchers introduce symmetry-compatible optimizers that respect the equivariance structures of neural network parameters, improving training stability and performance over traditional methods like Adam. The approach is validated on various language model architectures including Qwen3-0.6B, Gemma 3 1B, and OLMoE-1B-7B.
This paper formally proves that training neural networks with asymmetric activation functions like ReLU, GELU, or SiLU causes weights to drift negative, leading to up to 90% activation sparsity. It also shows that squared activations like ReLU² improve performance but cause activation spikes, which can be fixed by clipping, with GELU² achieving the best validation loss.
Snapchat paid $150 million for Looksery, a deep learning computer vision startup. A free MIT lecture teaches building neural networks from scratch.
Eric Jang announces he has been working on a from-scratch implementation of AlphaGo, the 2016 AI breakthrough that inspired him to enter deep learning.
An independent researcher recounts discovering that a $50M+ lab's paper on manifold steering converges with his own patented and published work on universal behavioral manifolds, highlighting the significance of independent scientific convergence.
TILT introduces a novel objective for unsupervised domain adaptation under covariate shift that penalizes an auxiliary component on unlabeled target data, implicitly achieving self-localized importance weighting with bounded estimands. Theoretical guarantees and experiments on shifted CIFAR-100 show improved target performance over baselines.
GoodfireAI found that neural networks perform math by rotating shapes, uncovering a shape-rotating calculator inside an LLM that is used for more than just math.