Tag
User asks whether purchasing an RTX 5090 and high-end PC for ~$5500 is worth it for LLM experimentation and learning, compared to cloud compute alternatives.
DeepSpeed is an open-source deep learning optimization library from Microsoft that enables efficient distributed training and inference of large-scale models with features like ZeRO, 3D parallelism, and Mixture-of-Experts.
A curated list of 15 AI-related Twitter accounts to follow, featuring prominent figures like Andrej Karpathy, François Chollet, Yann LeCun, Andrew Ng, and others known for research, education, and commentary.
This paper investigates the geometric structure of intermediate feature representations in deep neural networks by analyzing how various image manipulations map in feature space. It suggests that feature spaces are organized in linear structures to a first approximation, using generative image editing models to probe these representations.
This PhD thesis introduces deep learning methods for protein complex prediction and design, including GLINTER for contact prediction, ESMPair for homolog pairing, and RedNet for binder design.
This paper challenges the geometric justification for the Muon optimizer, arguing that precise structure is less important than step-size optimality. It introduces Freon and Kaon optimizers to demonstrate that random or inverted spectra can perform as well as Muon.
This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.
This paper introduces steerable neural ordinary differential equations on homogeneous spaces, providing a geometric framework for learning continuous-time equivariant dynamics.
This paper introduces HMH, a hierarchical multi-scale Graph Neural Network framework designed to address oversmoothing and oversquashing in heterophilous graphs. It utilizes spectral filters with Haar bases to achieve scalable learning and improved performance on node and graph classification tasks.
This paper presents Conv-VaDE, a variational deep embedding model for interpretable EEG microstate discovery that jointly learns topographic reconstruction and probabilistic soft clustering. It includes a systematic architecture search evaluated on resting-state EEG data to determine optimal model configurations for stability and interpretability.
Stanford's CS336 course on modern neural language models, covering topics like MoEs and RLHF, is being released on YouTube with a two-week delay.
This post shares a curated GitHub repository containing over 30 practical AI projects, covering domains from regression to generative AI, with many end-to-end examples, suitable for learners and developers.
This paper introduces C2L-Net, a data-driven model for efficient and accurate state-of-charge estimation of lithium-ion batteries using short historical windows.
This paper evaluates the biological plausibility and representational alignment of feedback alignment algorithms in convolutional networks, comparing them to standard backpropagation on CIFAR-10. The authors find that modified feedback alignment methods converge on internal representations similar to those produced by backpropagation, suggesting functional success through mimicking representational geometry.
This paper introduces LSAMD, a method for extracting 'learngenes' across multiple datasets to initialize variable-sized Vision Transformer models, significantly reducing training costs and storage while maintaining performance comparable to pretrain-finetune methods.
This paper introduces Dynamical Physics-Modeled Neural Networks (DynPMNNs), a continuous-time deep learning architecture where hidden layers are defined by ordinary differential equations. It presents a biologically inspired approach grounded in Reproducing Kernel Banach Spaces, demonstrating competitive performance on the California Housing dataset with fewer parameters than standard Neural ODEs.
This paper investigates integrating dendritic neural networks with equilibrium propagation, showing that this biologically plausible approach improves performance on challenging datasets compared to standard equilibrium propagation.
This empirical study validates theoretical findings on feature repulsion and spectral lock-in during the grokking phenomenon in two-layer neural networks, demonstrating how activation functions influence the transition from memorization to generalization.
This paper presents a novel deep learning approach to predict inertial lift forces in microfluidic devices without explicit geometric parameters, enabling better generalization to unseen channel cross-sections compared to previous models.
This paper introduces Pion, a novel spectrum-preserving optimizer for large language model training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers.