Tag
DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.
Token AI releases a research paper introducing STAM, a new adaptive momentum optimizer designed to improve training stability and reduce memory usage compared to standard optimizers like AdamW.
This paper identifies and addresses aggregation bias in GRPO-style reinforcement learning for LLMs, proposing Balanced Aggregation (BA) which improves training stability and final performance by computing token-level means separately for positive and negative subsets.