@nrehiew_: For the visual learners

X AI KOLs Timeline Papers

Summary

A thread reviewing the paper 'Pretraining Large Language Models with NVFP4' and discussing NVFP4 pre-training, especially for NVIDIA Blackwell.

For the visual learners https://t.co/rliyO8pOsL
Original Article
View Cached Full Text

Cached at: 06/05/26, 11:13 AM

For the visual learners https://t.co/rliyO8pOsL

wh (@nrehiew_): This paper prompted me to do a review of NVFP4 pre-training, given that NVIDIA seems to be pushing support for it especially on Blackwells.

Much of the content will come from “Pretraining Large Language Models with NVFP4” and the Nemotron 3 Super paper 🧵

Similar Articles

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Reddit r/MachineLearning

DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.