Tag
This article reverse-engineers the 69-bit adder at the core of Intel's 8087 floating-point coprocessor from 1980, explaining its architecture and carry-chain techniques.
dMX is a differentiable mixed-precision quantization framework that learns optimal floating-point bit-width assignments per layer for LLMs, targeting the MXFP family of formats defined by the OCP standard. It uses continuous optimization with temperature-based annealing and a budget-aware regularization term, consistently outperforming KL-divergence heuristics on Llama, Qwen3, and SmolLM2 models.
The IEEE P3109 draft standard defines a parameterized family of binary floating-point formats and operations tailored for machine learning, supporting configurable width, precision, signedness, and infinities, along with extensive rounding modes including stochastic rounding and a novel scale-invariant approximation measure called kappa-approximation.
Setting temperature to 0 does not guarantee deterministic tool calls in agents due to batched inference causing floating-point reduction order shifts, leading to token flips and different actions under load.
The article compares two methods for normalizing RGB values (dividing by 255 vs 256) and explains the consequences for floating-point conversion and rounding, including uneven bin widths at the extremes.
A detailed reverse-engineering analysis of the microcode inside the Intel 8087 floating-point coprocessor, focusing on the FXCH register exchange instruction and the chip's internal architecture.
Explores the behavior of floor and ceil functions when applied to denormalized floating-point numbers, highlighting differences between CPU and GPU implementations and potential pitfalls.
This paper introduces Digit Entropy Loss (DEL), a novel loss function for numerical learning in large language models that reformulates entropy optimization to improve digit-level prediction accuracy and handle floating-point numbers, consistently outperforming existing methods on mathematical reasoning benchmarks.
A report on the RISC-V architecture's floating point capabilities and updates.
FPSan is a Triton compiler pass that enables verification of algebraic equivalence of floating-point programs by replacing floating-point operations with integer operations, relying on Schanuel's conjecture for correctness.
The Engineering Director of Age of Empires provides an in-depth analysis of the technical debt in the series' pathfinding system over the past 25 years, pointing out that legacy code, dynamic map mechanics, and floating-point errors caused by SIMD instruction sets replacing x87 extended precision are the root causes of classic bugs such as units clipping through walls.
fc is an open-source lossless compressor for IEEE-754 64-bit double streams, offering superior compression ratios for structured data compared to zstd and fpzip, though with slower encoding speeds.
Developer releases `exact-poly`, a 2D geometry library using exact integer arithmetic instead of floats to eliminate cross-platform reproducibility issues caused by IEEE 754 implementation differences.
OpenBSD developer details the painful quirks of VAX floating-point exceptions and how they complicate kernel porting.
A detailed reverse-engineering analysis of how the Intel 8087 floating-point coprocessor decodes instructions, explaining the interplay between the main CPU and coprocessor, the use of microcode ROM, and the bus interface unit.
A detailed examination of the conditional tests used in the Intel 8087 floating-point coprocessor's microcode, part of a reverse-engineering effort to understand its algorithms.
The article details the reverse-engineering of the Intel 8087 floating-point coprocessor's stack circuitry, explaining how the chip's stack-based register architecture and microcode ROM enabled fast floating-point operations.