deep-learning

Tag

#deep-learning

Is it worth getting a 5090 for my needs?

Reddit r/LocalLLaMA ↗ · 4h ago

User asks whether purchasing an RTX 5090 and high-end PC for ~$5500 is worth it for LLM experimentation and learning, compared to cloud compute alternatives.

0 favorites 0 likes

#deep-learning

@_vmlops: MICROSOFT RESEARCHERS BUILT THIS TO TRAIN 530B PARAMETER MODELS Deepspeed is a deep learning optimization library that …

X AI KOLs Timeline ↗ · 6h ago Cached

DeepSpeed is an open-source deep learning optimization library from Microsoft that enables efficient distributed training and inference of large-scale models with features like ZeRO, 3D parallelism, and Mixture-of-Experts.

0 favorites 0 likes

#deep-learning

@rambuilds_: 15 AI related accounts you should follow on Twitter: 1. @karpathy His tweets already create LLMs narratives that you la…

X AI KOLs Timeline ↗ · 12h ago

A curated list of 15 AI-related Twitter accounts to follow, featuring prominent figures like Andrej Karpathy, François Chollet, Yann LeCun, Andrew Ng, and others known for research, education, and commentary.

0 favorites 0 likes

#deep-learning

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

arXiv cs.LG ↗ · 15h ago Cached

This paper investigates the geometric structure of intermediate feature representations in deep neural networks by analyzing how various image manipulations map in feature space. It suggests that feature spaces are organized in linear structures to a first approximation, using generative image editing models to probe these representations.

0 favorites 0 likes

#deep-learning

Deep Learning for Protein Complex Prediction and Design

arXiv cs.LG ↗ · 15h ago Cached

This PhD thesis introduces deep learning methods for protein complex prediction and design, including GLINTER for contact prediction, ESMPair for homolog pairing, and RedNet for binder design.

0 favorites 0 likes

#deep-learning

Muon is Not That Special: Random or Inverted Spectra Work Just as Well

arXiv cs.LG ↗ · 15h ago Cached

This paper challenges the geometric justification for the Muon optimizer, arguing that precise structure is less important than step-size optimality. It introduces Freon and Kaon optimizers to demonstrate that random or inverted spectra can perform as well as Muon.

0 favorites 0 likes

#deep-learning

Optimistic Dual Averaging Unifies Modern Optimizers

arXiv cs.LG ↗ · 15h ago Cached

This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.

0 favorites 0 likes

#deep-learning

Steerable Neural ODEs on Homogeneous Spaces

arXiv cs.LG ↗ · 15h ago Cached

This paper introduces steerable neural ordinary differential equations on homogeneous spaces, providing a geometric framework for learning continuous-time equivariant dynamics.

0 favorites 0 likes

#deep-learning

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation

arXiv cs.LG ↗ · 15h ago Cached

This paper introduces HMH, a hierarchical multi-scale Graph Neural Network framework designed to address oversmoothing and oversquashing in heterophilous graphs. It utilizes spectral filters with Haar bases to achieve scalable learning and improved performance on node and graph classification tasks.

0 favorites 0 likes

#deep-learning

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

arXiv cs.LG ↗ · 15h ago Cached

This paper presents Conv-VaDE, a variational deep embedding model for interpretable EEG microstate discovery that jointly learns topographic reconstruction and probabilistic soft clustering. It includes a systematic architecture search evaluated on resting-state EEG data to determine optimal model configurations for stability and interpretability.

0 favorites 0 likes

#deep-learning

@stanfordnlp: Many roughly know how a transformer works To REALLY understand modern neural LMs—MoEs, GPU tiling, kernels, RLHF, data—…

X AI KOLs Following ↗ · yesterday Cached

Stanford's CS336 course on modern neural language models, covering topics like MoEs and RLHF, is being released on YouTube with a two-week delay.

1 favorites 1 likes

#deep-learning

@wsl8297: When learning AI, the scariest part is getting stuck at "understanding the theory" and freezing when it's time to write code — not knowing where to start, and unable to find decent practice projects. I unearthed a practical treasure trove on GitHub: AI-Project-Gallery. It collects 30+ high-quality AI projects, covering classic topics like house price prediction and disease classification, as well as hot applications like Gemini chatbot and document generator...

X AI KOLs Timeline ↗ · yesterday Cached

This post shares a curated GitHub repository containing over 30 practical AI projects, covering domains from regression to generative AI, with many end-to-end examples, suitable for learners and developers.

0 favorites 0 likes

#deep-learning

C2L-Net: A Data-Driven Model for State-of-Charge Estimation of Lithium-Ion Batteries During Discharge

arXiv cs.AI ↗ · yesterday Cached

This paper introduces C2L-Net, a data-driven model for efficient and accurate state-of-charge estimation of lithium-ion batteries using short historical windows.

0 favorites 0 likes

#deep-learning

Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks

arXiv cs.AI ↗ · yesterday Cached

This paper evaluates the biological plausibility and representational alignment of feedback alignment algorithms in convolutional networks, comparing them to standard backpropagation on CIFAR-10. The authors find that modified feedback alignment methods converge on internal representations similar to those produced by backpropagation, suggesting functional success through mimicking representational geometry.

0 favorites 0 likes

#deep-learning

Learngene Search Across Multiple Datasets for Building Variable-Sized Models

arXiv cs.LG ↗ · yesterday Cached

This paper introduces LSAMD, a method for extracting 'learngenes' across multiple datasets to initialize variable-sized Vision Transformer models, significantly reducing training costs and storage while maintaining performance comparable to pretrain-finetune methods.

0 favorites 0 likes

#deep-learning

Physics-Modeled Neural Networks

arXiv cs.LG ↗ · yesterday Cached

This paper introduces Dynamical Physics-Modeled Neural Networks (DynPMNNs), a continuous-time deep learning architecture where hidden layers are defined by ordinary differential equations. It presents a biologically inspired approach grounded in Reproducing Kernel Banach Spaces, demonstrating competitive performance on the California Housing dataset with fewer parameters than standard Neural ODEs.

0 favorites 0 likes

#deep-learning

Dendritic Neural Networks with Equilibrium Propagation

arXiv cs.LG ↗ · yesterday Cached

This paper investigates integrating dendritic neural networks with equilibrium propagation, showing that this biologically plausible approach improves performance on challenging datasets compared to standard equilibrium propagation.

0 favorites 0 likes

#deep-learning

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

arXiv cs.LG ↗ · yesterday Cached

This empirical study validates theoretical findings on feature repulsion and spectral lock-in during the grokking phenomenon in two-layer neural networks, demonstrating how activation functions influence the transition from memorization to generalization.

0 favorites 0 likes

#deep-learning

Geometry-free prediction of inertial lift forces in microfluidic devices using deep learning

arXiv cs.LG ↗ · yesterday Cached

This paper presents a novel deep learning approach to predict inertial lift forces in microfluidic devices without explicit geometric parameters, enabling better generalization to unseen channel cross-sections compared to previous models.

0 favorites 0 likes

#deep-learning

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Hugging Face Daily Papers ↗ · yesterday Cached

This paper introduces Pion, a novel spectrum-preserving optimizer for large language model training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers.

0 favorites 0 likes

← Back to home

Submit Feedback