Tag
An interactive guide explaining speculative decoding and multi-token prediction in LLMs, covering techniques from rejection sampling to MTP used in Qwen 3.6 and Gemma 4, with live diagrams and sliders.
This article explores how intermediate floating-point precision in C++ code depends on compiler settings, CPU flags, and architecture, particularly on x87 FPU, and how this affects performance and calculation results.
This article from The Old New Thing explains that Windows thread pools are optimized for throughput, not latency, and provides solutions for low-latency scheduling, such as creating a custom thread pool or using a dedicated worker thread, with code examples in C++ and C#.
A detailed blog post explaining how virtual tables (vtables) are implemented in the Itanium C++ ABI, covering vtable structure, mangled names, and virtual function dispatch.
User recommends an article that delves into agent loops, memory mechanisms, harness engineering, and agent evaluation, highlighting its substantial value for readers who are studying agents in depth.
The article explains how the SNES PPUs render sprites and backgrounds under tight VRAM bandwidth constraints, describing the hardware trade-offs in different video modes.