gradient-alignment

Tag

Cards List
#gradient-alignment

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

arXiv cs.LG · 2026-06-17 Cached

Proposes MGUP, a momentum-gradient alignment update policy for selective intra-layer parameter updates in stochastic optimization, which integrates with optimizers like AdamW, Lion, and Muon, and provides theoretical convergence guarantees along with superior performance on large-scale model training tasks.

0 favorites 0 likes
#gradient-alignment

GRASP: Gradient-Aligned Sequential Parameter Transfer for Memory-Efficient Multi-Source Learning

arXiv cs.LG · 2026-06-16 Cached

GRASP proposes a method for multi-source transfer learning that sequentially merges source models into a single target model with constant O(1) memory usage, using gradient-based parameter alignment to avoid negative transfer. Experiments show it outperforms ensemble methods while being much more memory-efficient.

0 favorites 0 likes
#gradient-alignment

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

Hugging Face Daily Papers · 2026-05-11 Cached

This paper introduces a training-free diagnostic framework to analyze per-token distillation signals for reasoning models, revealing that guidance is more beneficial on incorrect rollouts and depends on student capacity and task context.

0 favorites 0 likes
← Back to home

Submit Feedback