fp4-quantization

#fp4-quantization

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Reddit r/MachineLearning ↗ · 7h ago

DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.

0 favorites 0 likes

fp4-quantization

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Submit Feedback