Rotation revisited: Cycle decomposition in clang’s libcxx

The Old New Thing (Raymond Chen) News

Summary

The article delves into the cycle decomposition algorithm used in clang's libcxx for rotation, explaining how it achieves the minimum number of swaps by computing the greatest common divisor (gcd) to determine the number of cycles.

<p>We got distracted by the rotation algorithm in gcc&#8217;s libstdc++, <a title="Rotation revisited: Another unidirectional algorithm" href="https://devblogs.microsoft.com/oldnewthing/20260602-00/?p=112376"> but let&#8217;s get back to the cycle decomposition algorithm in </a><a href="https://github.com/llvm/llvm-project/blob/682c8e22e61f50ce2d9a0c42475a3aa6d578a1ad/libcxx/include/__algorithm/rotate.h"> clang&#8217;s libcxx</a><a title="Rotation revisited: Another unidirectional algorithm" href="https://devblogs.microsoft.com/oldnewthing/20260602-00/?p=112376">. </a></p> <p>The implementation in clang&#8217;s libcxx performs the minimum number of swaps, roughly <var>n</var>/2, where <var>n</var> is the total number of elements. It does so by viewing the rotation as a permutation and walking through each of the cycles.</p> <p>For notational convenience, let <var>a</var> be |A| and <var>n</var> be |A| + |B| (the total number of elements). The number of cycles is <a href="https://en.wikipedia.org/wiki/Greatest_common_divisor">gcd</a>(<var>a</var>, <var>b</var>), and the <var>k</var>&#8216;th cycle consists of the elements starting at <code>first</code> + <var>k</var>, and then stepping to the next element by moving forward another <var>a</var> elements, with wraparound, until you return back to the starting point.</p> <p>For example, if you have |A| = 4 and |B| = 6, then the cycle that starts at A1 takes 4 steps forward to continues to B1; takes another 4 steps forward to B5; then takes 2 steps forward, wraps around, and then two more steps forward, landing on A3; then takes 4 steps forward to B3; and then takes 4 steps forward and wraps around to A1, which is the starting point.</p> <table style="border-collapse: collapse; text-align: center;" title="See text" border="0" cellspacing="0" cellpadding="1"> <tbody> <tr> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td> <td style="width: 2em; border: solid 1px currentcolor;">A2</td> <td style="width: 2em; border: solid 1px currentcolor;">A3</td> <td style="width: 2em; border: solid 1px currentcolor;">A4</td> <td style="width: 2em; border: solid 1px currentcolor;">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor;">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> <td style="width: 2em; border: solid 1px currentcolor;">B5</td> <td style="width: 2em; border: solid 1px currentcolor;">B6</td> </tr> <tr> <td>↳</td> <td>→</td> <td>→</td> <td>→</td> <td>↴</td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> </tr> <tr> <td style="width: 2em; border: solid 1px currentcolor;">A1</td> <td style="width: 2em; border: solid 1px currentcolor;">A2</td> <td style="width: 2em; border: solid 1px currentcolor;">A3</td> <td style="width: 2em; border: solid 1px currentcolor;">A4</td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor;">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> <td style="width: 2em; border: solid 1px currentcolor;">B5</td> <td style="width: 2em; border: solid 1px currentcolor;">B6</td> </tr> <tr> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td>↳</td> <td>→</td> <td>→</td> <td>→</td> <td>↴</td> <td><!-- --></td> </tr> <tr> <td style="width: 2em; border: solid 1px currentcolor;">A1</td> <td style="width: 2em; border: solid 1px currentcolor;">A2</td> <td style="width: 2em; border: solid 1px currentcolor;">A3</td> <td style="width: 2em; border: solid 1px currentcolor;">A4</td> <td style="width: 2em; border: solid 1px currentcolor;">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor;">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">B5</td> <td style="width: 2em; border: solid 1px currentcolor;">B6</td> </tr> <tr> <td>→</td> <td>→</td> <td>↴</td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td>↳</td> <td>→</td> </tr> <tr> <td style="width: 2em; border: solid 1px currentcolor;">A1</td> <td style="width: 2em; border: solid 1px currentcolor;">A2</td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A3</td> <td style="width: 2em; border: solid 1px currentcolor;">A4</td> <td style="width: 2em; border: solid 1px currentcolor;">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor;">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> <td style="width: 2em; border: solid 1px currentcolor;">B5</td> <td style="width: 2em; border: solid 1px currentcolor;">B6</td> </tr> <tr> <td><!-- --></td> <td><!-- --></td> <td>↳</td> <td>→</td> <td>→</td> <td>→</td> <td>↴</td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> </tr> <tr> <td style="width: 2em; border: solid 1px currentcolor;">A1</td> <td style="width: 2em; border: solid 1px currentcolor;">A2</td> <td style="width: 2em; border: solid 1px currentcolor;">A3</td> <td style="width: 2em; border: solid 1px currentcolor;">A4</td> <td style="width: 2em; border: solid 1px currentcolor;">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> <td style="width: 2em; border: solid 1px currentcolor;">B5</td> <td style="width: 2em; border: solid 1px currentcolor;">B6</td> </tr> <tr> <td>↴</td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td><!-- --></td> <td>↳</td> <td>→</td> <td>→</td> <td>→</td> </tr> <tr> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td> <td style="width: 2em; border: solid 1px currentcolor;">A2</td> <td style="width: 2em; border: solid 1px currentcolor;">A3</td> <td style="width: 2em; border: solid 1px currentcolor;">A4</td> <td style="width: 2em; border: solid 1px currentcolor;">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor;">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> <td style="width: 2em; border: solid 1px currentcolor;">B5</td> <td style="width: 2em; border: solid 1px currentcolor;">B6</td> </tr> </tbody> </table> <p>There&#8217;s another cycle that starts at A2 and continues to B2, B6, A4, B4, then back to A2.</p> <p>Now, we&#8217;ve been counting swaps, but a single-element rotation is not done as a sequence of swaps, but rather by picking up the first element, sliding all the other elements over, and then putting the original first element at the end. I&#8217;ve been informally calling an assignment &#8220;half of a swap&#8221;, though a swap is really a constructor, two assignments, and a destructor. But let&#8217;s stick with the &#8220;half a swap&#8221; accounting fiction.</p> <p>The rotation algorithm goes like this:</p> <pre>auto a = std::distance(first, mid); // number of "A" elements auto n = std::distance(first, last); // total elements auto g = gcd(a, n); // number of cycles for (auto k = 0; k &lt; g; ++k) { // Rotate the elements in the cycle starting at k auto save = std::move(first[k]); auto i, next = k; while (i = next, next = (i + a) % n, next != k) { first[i] = std::move(first[next]); } first[i] = std::move(save); } </pre> <p>For example, if rotating A1, A2, B1, B2, B3, B4, there are two cycles: A1, B1, B3; and A2, B2, B4. The elements within each cycle rotate one position.</p> <table style="border-collapse: collapse; text-align: center;" title="See text" border="0" cellspacing="0" cellpadding="1"> <tbody> <tr> <td>⮣</td> <td>→</td> <td>→</td> <td>→</td> <td>→</td> <td>→</td> <td>⮧</td> </tr> <tr> <td>⮤</td> <td style="border: solid 1px currentcolor; border-bottom: none;"><!-- --></td> <td>←</td> <td style="border: solid 1px currentcolor; border-bottom: none;"><!-- --></td> <td>←</td> <td style="border: solid 1px currentcolor; border-bottom: none;"><!-- --></td> <td>⮠</td> </tr> <tr style="height: 1ex;"> <td colspan="7"><!-- --></td> </tr> <tr> <td style="width: 2em;"> </td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A2</td> <td style="width: 2em; border: solid 1px currentcolor;">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor;">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> </tr> <tr style="height: 1ex;"> <td colspan="7"><!-- --></td> </tr> <tr> <td>⮦</td> <td style="border: solid 1px currentcolor; border-top: none;"><!-- --></td> <td>←</td> <td style="border: solid 1px currentcolor; border-top: none;"><!-- --></td> <td>←</td> <td style="border: solid 1px currentcolor; border-top: none;"><!-- --></td> <td>⮢</td> </tr> <tr> <td>⮡</td> <td>→</td> <td>→</td> <td>→</td> <td>→</td> <td>→</td> <td>⮥</td> </tr> </tbody> </table> <p>And when you&#8217;re done with all the cycles, you&#8217;ve rotated the entire A and B blocks.</p> <table style="border-collapse: collapse; text-align: center;" title="See text" border="0" cellspacing="0" cellpadding="1"> <tbody> <tr> <td style="width: 2em; border: solid 1px currentcolor;">B1</td> <td style="width: 2em; border: solid 1px currentcolor;">B2</td> <td style="width: 2em; border: solid 1px currentcolor;">B3</td> <td style="width: 2em; border: solid 1px currentcolor;">B4</td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td> <td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A2</td> </tr> </tbody> </table> <p>This performs <var>n</var>/2 swaps, which is the fewest swaps of all the algorithms we&#8217;ve looked at so far. However, it has terrible locality because the elements in the cycle are all spread out.</p> <p>Calculating the greated common divisor of two numbers can be done in <var>O</var>(log <var>n</var>) steps via Euclid&#8217;s algorithm.</p> <pre>int gcd(int a, int b) { do { auto r = a % b; a = b; b = r; } while (r); return a; } </pre> <p><a href="https://devblogs.microsoft.com/oldnewthing/20260101-00/?p=111955&amp;commentid=143690#comment-143690"> Commenter Brent thought that the cycle decomposition algorithm was obvious</a>. Of course, the trick is the step they called &#8220;Repeat&#8221;. How many times do you repeat?</p> <p>The clang libcxx algorithm calculates the number of repeats by taking the gcd. But there&#8217;s a trick so we don&#8217;t have to calculated it at all. We&#8217;ll look at that trick next time.</p> <p><b>Bonus chatter</b>: I think it&#8217;s interesting that of the three major implementations of the C++ standard library, each one uses a different rotation algorithm when given random-access iterators!</p> <p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260604-00/?p=112384">Rotation revisited: Cycle decomposition in clang&#8217;s libcxx</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
Original Article
View Cached Full Text

Cached at: 06/05/26, 02:06 PM

# Rotation revisited: Cycle decomposition in clang's libcxx - The Old New Thing Source: [https://devblogs.microsoft.com/oldnewthing/20260604-00?p=112384](https://devblogs.microsoft.com/oldnewthing/20260604-00?p=112384) We got distracted by the rotation algorithm in gcc’s libstdc\+\+,[but let’s get back to the cycle decomposition algorithm in](https://devblogs.microsoft.com/oldnewthing/20260602-00/?p=112376)[clang’s libcxx](https://github.com/llvm/llvm-project/blob/682c8e22e61f50ce2d9a0c42475a3aa6d578a1ad/libcxx/include/__algorithm/rotate.h)[\.](https://devblogs.microsoft.com/oldnewthing/20260602-00/?p=112376) The implementation in clang’s libcxx performs the minimum number of swaps, roughlyn/2, wherenis the total number of elements\. It does so by viewing the rotation as a permutation and walking through each of the cycles\. For notational convenience, letabe \|A\| andnbe \|A\| \+ \|B\| \(the total number of elements\)\. The number of cycles is[gcd](https://en.wikipedia.org/wiki/Greatest_common_divisor)\(a,b\), and thek‘th cycle consists of the elements starting at`first`\+k, and then stepping to the next element by moving forward anotheraelements, with wraparound, until you return back to the starting point\. For example, if you have \|A\| = 4 and \|B\| = 6, then the cycle that starts at A1 takes 4 steps forward to continues to B1; takes another 4 steps forward to B5; then takes 2 steps forward, wraps around, and then two more steps forward, landing on A3; then takes 4 steps forward to B3; and then takes 4 steps forward and wraps around to A1, which is the starting point\. A1A2A3A4B1B2B3B4B5B6↳→→→↴A1A2A3A4B1B2B3B4B5B6↳→→→↴A1A2A3A4B1B2B3B4B5B6→→↴↳→A1A2A3A4B1B2B3B4B5B6↳→→→↴A1A2A3A4B1B2B3B4B5B6↴↳→→→A1A2A3A4B1B2B3B4B5B6There’s another cycle that starts at A2 and continues to B2, B6, A4, B4, then back to A2\. Now, we’ve been counting swaps, but a single\-element rotation is not done as a sequence of swaps, but rather by picking up the first element, sliding all the other elements over, and then putting the original first element at the end\. I’ve been informally calling an assignment “half of a swap”, though a swap is really a constructor, two assignments, and a destructor\. But let’s stick with the “half a swap” accounting fiction\. The rotation algorithm goes like this: ``` auto a = std::distance(first, mid); // number of "A" elements auto n = std::distance(first, last); // total elements auto g = gcd(a, n); // number of cycles for (auto k = 0; k < g; ++k) { // Rotate the elements in the cycle starting at k auto save = std::move(first[k]); auto i, next = k; while (i = next, next = (i + a) % n, next != k) { first[i] = std::move(first[next]); } first[i] = std::move(save); } ``` For example, if rotating A1, A2, B1, B2, B3, B4, there are two cycles: A1, B1, B3; and A2, B2, B4\. The elements within each cycle rotate one position\. ⮣→→→→→⮧⮤←←⮠A1A2B1B2B3B4⮦←←⮢⮡→→→→→⮥And when you’re done with all the cycles, you’ve rotated the entire A and B blocks\. B1B2B3B4A1A2This performsn/2 swaps, which is the fewest swaps of all the algorithms we’ve looked at so far\. However, it has terrible locality because the elements in the cycle are all spread out\. Calculating the greated common divisor of two numbers can be done inO\(logn\) steps via Euclid’s algorithm\. ``` int gcd(int a, int b) { do { auto r = a % b; a = b; b = r; } while (r); return a; } ``` [Commenter Brent thought that the cycle decomposition algorithm was obvious](https://devblogs.microsoft.com/oldnewthing/20260101-00/?p=111955&commentid=143690#comment-143690)\. Of course, the trick is the step they called “Repeat”\. How many times do you repeat? The clang libcxx algorithm calculates the number of repeats by taking the gcd\. But there’s a trick so we don’t have to calculated it at all\. We’ll look at that trick next time\. **Bonus chatter**: I think it’s interesting that of the three major implementations of the C\+\+ standard library, each one uses a different rotation algorithm when given random\-access iterators\! ### Category ### Topics ## Author ![Raymond Chen](https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2019/02/RaymondChen_5in-150x150.jpg) Raymond has been involved in the evolution of Windows for more than 30 years\. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie\-jeebies\. The Web site spawned a book, coincidentally also titled The Old New Thing \(Addison Wesley 2007\)\. He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information\.

Similar Articles

When compilers surprise you

Lobsters Hottest

Matt Godbolt explores compiler optimizations that convert an O(n) summation loop into an O(1) closed-form solution, highlighting how Clang and GCC employ sophisticated techniques like loop unrolling and mathematical simplification to dramatically improve code performance.