Tag
This paper presents a sentiment analysis and spam detection system for Arabic tweets using the MARBERT model, trained on a dataset of 24,513 tweets to improve customer service for Saudi Telecom Company.
Proposes the Multi-Stream Fraud Transformer (MSFT) for financial fraud detection, which independently encodes transaction, login, and risk event streams using Transformers and fuses them with time-aware positional encoding and gated fusion, achieving 0.9961 AUROC on a large dataset.
This paper introduces a model-free deep learning method for solving high-dimensional nonlinear partial differential equations with unknown coefficients, using zeroth-order derivative estimators derived from perturbed Monte Carlo trajectories. The approach avoids automatic differentiation, provides theoretical error bounds, and demonstrates competitive performance in numerical experiments.
The paper proposes using spectral entropy as a metric to quantify noise introduced by explainability techniques in ECG arrhythmia classification, helping to distinguish true model signal from XAI-generated artifacts.
Recommend computer science students to study the Stanford CS336 course (Language Modeling from Scratch) to improve LLM understanding and English ability.
UC Berkeley researchers trained an AI model on hundreds of thousands of EKGs to detect a previously unrecognized signal that predicts sudden cardiac death risk more accurately than current methods, potentially saving thousands of lives annually.
Promotes a structured MIT deep learning course that covers foundations, generative models, agents, and sequence problems. The course aims to build practical understanding before advanced topics.
Discusses why GLM-5.2 moved away from GRPO, suggesting that GRPO's assumptions may not hold for long-horizon agentic tasks.
A curated list of 10 free AI learning resources including courses, newsletters, podcasts, and interactive books from experts like 3Blue1Brown, Andrej Karpathy, and Andrew Ng.
The paper proposes RAVEN, a Mixture-of-Experts framework that adaptively determines temporal context windows for each input sample to handle non-stationary financial time series. It achieves state-of-the-art performance on financial and traffic benchmarks.
This paper introduces the Continual IVON (CoVON) optimizer, which integrates fast and slow adaptation into variational continual learning to balance stability and plasticity, outperforming existing methods in domain-incremental learning, continual pre-training, and fine-tuning of large language models.
This paper presents a large-scale empirical study of the Derivative Regularization (DREG) penalty, showing it achieves high accuracy and noise robustness, particularly with GELU activation and data-scarce regimes, positioning it as a general-purpose plug-and-play regularizer for neural networks.
This paper introduces ARIA, a framework that adaptively allocates training effort across regions of the conditioning space for distilling conditional diffusion models, improving performance on unseen and underrepresented conditions.
This paper presents a deep learning approach using a spatio-temporal graph neural network (MTGNN) to reconstruct GRACE terrestrial water storage anomalies back to 1940 for South America, achieving high accuracy and outperforming previous methods with fewer predictors.
Proposes a novel meta-learning strategy called MEDIC for open set domain generalization, which uses implicit gradient matching across domain and class splits to achieve better boundaries. Experiments show state-of-the-art performance.
This paper proposes a probabilistic framework for Alzheimer's disease progression forecasting that combines ordinal diagnosis prediction, multi-horizon trajectory generation, and decomposed uncertainty estimation using a Temporal Fusion Transformer encoder and an autoregressive Mixture Density Network. The model outperforms baselines on ADNI data, achieving near-nominal 90% credible interval coverage with clinically meaningful uncertainty signals.
This paper proposes MVG-KAN, a multi-view model integrating periodic-residual decomposition, a Geo-Wind Graph for wind-aware spatial dependencies, and a temporal KAN head for PM2.5 forecasting, achieving MAE 14.09 on Beijing data.
This paper investigates the distribution and evolution of aspect-level sentiments in multi-round peer reviews from Nature Communications, using a deep learning approach (LCF-BERT-CDM) to achieve 82.65% Macro-F1, and finds that positive sentiment increases while negative sentiment decreases with more review rounds.
A set of four cards covering the core concepts of neural networks: neuron, forward pass, activations, and backpropagation, aimed at helping learners understand how models from perceptrons to transformers work.
Release of free workshop recordings and materials (23 videos, 250 slides, 50 exercises) for building your own LLM from fundamentals to transformer architecture, with no math or ML prerequisites.