When compilers surprise you

Lobsters Hottest News

Summary

Matt Godbolt explores compiler optimizations that convert an O(n) summation loop into an O(1) closed-form solution, highlighting how Clang and GCC employ sophisticated techniques like loop unrolling and mathematical simplification to dramatically improve code performance.

<p><a href="https://lobste.rs/s/vugypt/when_compilers_surprise_you">Comments</a></p>
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:44 PM

# When compilers surprise you — Matt Godbolt’s blog Source: [https://xania.org/202512/24-cunning-clang](https://xania.org/202512/24-cunning-clang) Written by me, proof\-read by an LLM\. Details at end\. Every now and then a compiler will surprise me with a really smart trick\. When I first saw this optimisation I could hardly believe it\. I was looking at loop optimisation, and wrote something like this simple function that sums all the numbers up to a given value: So far so decent: GCC has done some preliminary checks, then fallen into a loop that efficiently sums numbers using`lea`\(we’ve[seen this before](https://xania.org/202512/02-adding-integers)\)\. But taking a closer look at the loop we see something unusual: ``` .L3: lea edx, [rdx+1+rax*2] ; result = result + 1 + x*2 add eax, 2 ; x += 2 cmp edi, eax ; x != value jne .L3 ; keep looping ``` The compiler has cleverly realised it can do two numbers[1](https://xania.org/202512/24-cunning-clang#fn:check)at a time using the fact it can see we’re going to add`x`*and*`x \+ 1`, which is the same as adding`x\*2 \+ 1`\. Very cunning, I think you’ll agree\! If you turn the optimiser up to`\-O3`you’ll see the compiler works even harder to vectorise the loop using parallel adds\. All very clever\. This is all for GCC\. Let’s see what clang does with our code: This is where I nearly fell off my chair:**there is no loop**\! Clang checks for positive`value`, and if so it does: ``` lea eax, [rdi - 1] ; eax = value - 1 lea ecx, [rdi - 2] ; ecx = value - 2 imul rcx, rax ; rcx = (value - 1) * (value - 2) shr rcx ; rcx >>= 1 lea eax, [rdi + rcx] ; eax = value + rcx dec eax ; --eax ret ``` It was not at all obvious to me what on earth was going on here\. By backing out the maths a little, this is equivalent to: ``` v + ((v - 1)(v - 2) / 2) - 1; ``` Expanding the parentheses: ``` v + (v² - 2v - v + 2) / 2 - 1 ``` Rearranging a bit: ``` (v² - 3v + 2) / 2 + (v - 1) ``` Multiplying the`\(v \- 1\)`by 2 / 2: ``` (v² - 3v + 2) / 2 + (2v - 2)/2 ``` Combining those and cancelling: Simplifying and factoring gives us`v\(v \- 1\) / 2`which is the closed\-form solution to the “sum of integers”\! Truly amazing[2](https://xania.org/202512/24-cunning-clang#fn:why)\- we’ve gone from an O\(n\) algorithm as written, to an O\(1\) one\! I love that despite working with compilers for more than twenty years, they can still surprise and delight me\. The years of experience and work that have been poured into making compilers great is truly humbling, and inspiring\. We’re nearly at the end of this series \- there’s so much more to say but that will have to wait for another time\. Tomorrow will be a little different: see you then\! *See[the video](https://youtu.be/V9dy34slaxA)that accompanies this post\.* --- *This post is day 24 of[Advent of Compiler Optimisations 2025](https://xania.org/AoCO2025-archive), a 25\-day series exploring how compilers transform our code\.* *←[Switching it up a bit](https://xania.org/202512/23-switching-it-up)\|[Thank you](https://xania.org/202512/25-thank-you)→* *This post was written by a human \([Matt Godbolt](https://xania.org/MattGodbolt)\) and reviewed and proof\-read by LLMs and humans\.* *Support Compiler Explorer on[Patreon](https://patreon.com/c/mattgodbolt)or[GitHub](https://github.com/sponsors/compiler-explorer), or by buying CE products in the[Compiler Explorer Shop](https://shop.compiler-explorer.com/)*\. Posted at 06:00:00 CST on 24thDecember 2025\.

Similar Articles

Building a C compiler with a team of parallel Claudes

Anthropic Engineering

Anthropic researcher demonstrates using a team of 16 parallel Claude instances to autonomously build a C compiler in Rust capable of compiling the Linux kernel. The article details the architecture, cost, and lessons learned from this multi-agent autonomous coding experiment.

Writing a C Compiler, in Zig

Hacker News Top

A developer documents their experience building a C compiler named paella in Zig, following Nora Sandler’s tutorial series.

Making Julia as Fast as C++ (2019)

Hacker News Top

A 2019 blog post from FLOW Lab at BYU explores how to optimize Julia code to match C++ performance using a real-world aerodynamics application (vortex particle method) as a benchmark. The author shares lessons learned about achieving high-performance computing in Julia through type declarations, JIT compilation, and code optimization techniques.