adam

#adam

Why Muon Outperforms Adam: A Curvature Perspective

Hugging Face Daily Papers ↗ · 2026-06-03 Cached

This paper investigates why the Muon optimizer outperforms Adam in large language model training, showing from a curvature perspective that Muon incurs a smaller curvature penalty due to lower normalized directional sharpness, with advantages amplified by data imbalance.

0 favorites 0 likes

#adam

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness

arXiv cs.LG ↗ · 2026-06-01 Cached

This paper generalizes non-uniform smoothness assumptions to objectives whose curvature is affine in the objective value, proving convergence rates for steepest descent and diagonal variants of RMSProp and Adam, with applications to logistic regression and neural networks.

0 favorites 0 likes

#adam

A Rod Flow Model for Adam at the Edge of Stability

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper introduces a 'rod flow' model for Adam and other adaptive optimizers to better analyze their behavior at the edge of stability. It extends continuous-time modeling to momentum methods, showing improved accuracy in tracking discrete iterates compared to stable flow models.

0 favorites 0 likes

#adam

Revisiting Adam for Streaming Reinforcement Learning

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper revisits the Adam optimizer for streaming reinforcement learning, demonstrating that established methods like DQN and C51 perform well when properly tuned. The authors propose Adaptive Q(lambda), which combines eligibility traces with Adam's variance adaptation to surpass existing streaming RL methods on 55 Atari games.

0 favorites 0 likes

#adam

Can Muon Fine-tune Adam-Pretrained Models?

Hugging Face Daily Papers ↗ · 2026-05-11 Cached

Research paper investigating performance degradation when using the Muon optimizer instead of Adam for fine-tuning pretrained models, demonstrating that parameter-efficient methods like LoRA effectively mitigate this optimizer mismatch across language and vision tasks.

0 favorites 0 likes

adam

Why Muon Outperforms Adam: A Curvature Perspective

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness

A Rod Flow Model for Adam at the Edge of Stability

Revisiting Adam for Streaming Reinforcement Learning

Can Muon Fine-tune Adam-Pretrained Models?

Submit Feedback