training-stability

#training-stability

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Reddit r/MachineLearning ↗ · 5h ago

DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.

0 favorites 0 likes

#training-stability

A new generation of AI models and one of the most powerful research papers out there.

Reddit r/LocalLLaMA ↗ · yesterday

Token AI releases a research paper introducing STAM, a new adaptive momentum optimizer designed to improve training stability and reduce memory usage compared to standard optimizers like AdamW.

0 favorites 0 likes

#training-stability

Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

Hugging Face Daily Papers ↗ · 2026-04-14 Cached

This paper identifies and addresses aggregation bias in GRPO-style reinforcement learning for LLMs, proposing Balanced Aggregation (BA) which improves training stability and final performance by computing token-level means separately for positive and negative subsets.

0 favorites 0 likes

training-stability

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

A new generation of AI models and one of the most powerful research papers out there.

Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

Submit Feedback