Tag
A technical comparison reveals that Google's Q4_0 quantized Gemma-4 models have higher precision and more high-precision tensors than Unsloth's Q4_K_XL versions, resulting in larger file sizes.
This paper analyzes precision loss in FP8 attention due to the attention sink phenomenon when casting the softmax output to FP8 (E4M3). It shows that forward KV iteration causes underflow of non-sink attention values, and proposes reverse iteration and a static scaling factor S=256 to eliminate underflow, achieving 3-10x MSE improvement.
DeepSeek V4 Pro reportedly outperforms GPT-5.5 Pro on precision, suggesting a significant advancement in model accuracy.
This paper identifies a blind spot in reference-free faithfulness metrics: they only measure precision (whether claims are supported) but not recall (coverage of relevant facts). The authors introduce a complete-oracle evaluation using Formula 1 telemetry and weather data, showing that high-precision models often have poor coverage, and propose a combined metric.
This article demonstrates that using stochastic rounding for BF16 optimizer state can match FP32 performance because unbiased errors cancel over time, whereas round-to-nearest stalls due to compounding bias. An experiment with an MLP shows BF16+SR achieves similar loss to FP32 while using less memory.
A game developer describes fixing a GPU rendering bug in their game Blackshift, where float precision issues when casting 8-bit adjacency integers to floats caused visual artifacts on certain NVIDIA GPUs, with the bug appearing in the main render but not in preview mode.