deep-learning

#deep-learning

RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting

arXiv cs.LG ↗ · 6h ago Cached

The paper proposes RAVEN, a Mixture-of-Experts framework that adaptively determines temporal context windows for each input sample to handle non-stationary financial time series. It achieves state-of-the-art performance on financial and traffic benchmarks.

0 favorites 0 likes

#deep-learning

Fast and Slow Variational Continual Learning

arXiv cs.LG ↗ · 6h ago Cached

This paper introduces the Continual IVON (CoVON) optimizer, which integrates fast and slow adaptation into variational continual learning to balance stability and plasticity, outperforming existing methods in domain-incremental learning, continual pre-training, and fine-tuning of large language models.

0 favorites 0 likes

#deep-learning

DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty

arXiv cs.LG ↗ · 6h ago Cached

This paper presents a large-scale empirical study of the Derivative Regularization (DREG) penalty, showing it achieves high accuracy and noise robustness, particularly with GELU activation and data-scarce regimes, positioning it as a general-purpose plug-and-play regularizer for neural networks.

0 favorites 0 likes

#deep-learning

ARIA: Adaptive Region-Based Importance Allocation for Conditional Diffusion Distillation

arXiv cs.LG ↗ · 6h ago Cached

This paper introduces ARIA, a framework that adaptively allocates training effort across regions of the conditioning space for distilling conditional diffusion models, improving performance on unseen and underrepresented conditions.

0 favorites 0 likes

#deep-learning

Reconstructing GRACE Terrestrial Water Storage with Spatio-Temporal Graph Neural Networks: An Application to South America

arXiv cs.LG ↗ · 6h ago Cached

This paper presents a deep learning approach using a spatio-temporal graph neural network (MTGNN) to reconstruct GRACE terrestrial water storage anomalies back to 1940 for South America, achieving high accuracy and outperforming previous methods with fewer predictors.

0 favorites 0 likes

#deep-learning

Exploring Dualistic Meta-Learning to Enhance Domain Generalization in Open Set Scenarios

arXiv cs.LG ↗ · 6h ago Cached

Proposes a novel meta-learning strategy called MEDIC for open set domain generalization, which uses implicit gradient matching across domain and class splits to achieve better boundaries. Experiments show state-of-the-art performance.

0 favorites 0 likes

#deep-learning

Uncertainty-Aware Longitudinal Forecasting of Alzheimer's Disease Progression Using Deep Learning

arXiv cs.AI ↗ · 6h ago Cached

This paper proposes a probabilistic framework for Alzheimer's disease progression forecasting that combines ordinal diagnosis prediction, multi-horizon trajectory generation, and decomposed uncertainty estimation using a Temporal Fusion Transformer encoder and an autoregressive Mixture Density Network. The model outperforms baselines on ADNI data, achieving near-nominal 90% credible interval coverage with clinically meaningful uncertainty signals.

0 favorites 0 likes

#deep-learning

MVG-KAN: Multi-View Geo-Wind Guided KAN for PM$_{2.5}$ Forecasting

arXiv cs.AI ↗ · 6h ago Cached

This paper proposes MVG-KAN, a multi-view model integrating periodic-residual decomposition, a Geo-Wind Graph for wind-aware spatial dependencies, and a temporal KAN head for PM2.5 forecasting, achieving MAE 14.09 on Beijing data.

0 favorites 0 likes

#deep-learning

Aspect-Based Sentiment Evolution and its Correlation with Review Rounds in Multi-Round Peer Reviews: A Deep Learning Approach

arXiv cs.CL ↗ · 6h ago Cached

This paper investigates the distribution and evolution of aspect-level sentiments in multi-round peer reviews from Nature Communications, using a deep learning approach (LCF-BERT-CDM) to achieve 82.65% Macro-F1, and finds that positive sentiment increases while negative sentiment decreases with more review rounds.

0 favorites 0 likes

#deep-learning

@JustinAngel: https://x.com/JustinAngel/status/2069482255312195980

X AI KOLs Timeline ↗ · 16h ago Cached

Release of free workshop recordings and materials (23 videos, 250 slides, 50 exercises) for building your own LLM from fundamentals to transformer architecture, with no math or ML prerequisites.

0 favorites 0 likes

#deep-learning

Unlimited OCR: One-Shot Long-Horizon Parsing

Hacker News Top ↗ · 22h ago Cached

Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.

0 favorites 0 likes

#deep-learning

@liuren: https://x.com/liuren/status/2069266318747165146

X AI KOLs Timeline ↗ · yesterday Cached

This article recounts in detail the story of Jia Yangqing developing the deep learning framework Caffe (originally named Decaf) from scratch during his time at Berkeley and choosing to open-source it, as well as his personal growth from a student to a technical leader.

0 favorites 0 likes

#deep-learning

@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…

X AI KOLs Timeline ↗ · yesterday Cached

Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.

0 favorites 0 likes

#deep-learning

@0xSero: Highly recommended educational content. LoRA is one of the coolest things to dabble in, lets anyone fine tune models re…

X AI KOLs Timeline ↗ · yesterday Cached

This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.

0 favorites 0 likes

#deep-learning

Attention Is All You Need

Reddit r/ArtificialInteligence ↗ · yesterday

A reflection on the landmark 'Attention Is All You Need' paper, highlighting how removing recurrence and relying solely on attention mechanisms revolutionized AI and led to modern LLMs like GPT and Claude.

0 favorites 0 likes

#deep-learning

Tapered Language Models

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper introduces Tapered Language Models (TLMs), an architecture principle that allocates more parameters to earlier layers and fewer to later layers, consistently improving perplexity and downstream performance across multiple architectures without extra cost.

0 favorites 0 likes

#deep-learning

@PandaTalk8: The Most Worth-Following YouTube Channels for Learning AI in 2026, No-Nonsense Edition. Bookmark them, study in this order: 1. 3Blue1Brown AI / Math Foundation. Uses visualizations to clearly explain linear algebra, neural networks, and underlying mathematical intuition. https://youtube.c…

X AI KOLs Timeline ↗ · 2d ago Cached

Recommends 15 YouTube channels for learning AI in 2026, categorized by learning stage, with study path advice for beginners, engineering projects, and cutting-edge trends.

1 favorites 1 likes

#deep-learning

An Update on Matrix Recurrent Units, an Attention Alternative [R]

Reddit r/MachineLearning ↗ · 2d ago

An update on Matrix Recurrent Units (MRU), a linear-time attention alternative. The author explores methods to stabilize training, finding that orthogonal matrices underperform while LDU factorization works best, and shows MRU underperforms transformers on larger datasets like TinyStories.

0 favorites 0 likes

#deep-learning

@TheTuringPost: https://x.com/TheTuringPost/status/2068474648925216861

X AI KOLs Timeline ↗ · 3d ago Cached

An educational overview of knowledge distillation, covering its history, core concepts like softmax and temperature, types, scaling laws, and practical examples including DeepSeek-R1.

0 favorites 0 likes

#deep-learning

I wrote a free 15-part series on LLM internals — real math, real tensor shapes, real hardware constraints. All grounded in Gemma 4 12B's actual config.

Reddit r/LocalLLaMA ↗ · 3d ago

A comprehensive 15-part series covering LLM internals from tokenization to serving, grounded in Gemma 4 12B's actual config.

0 favorites 0 likes

deep-learning

Submit Feedback