Catch Flakes On Main

matklad News

Summary

The article discusses the problem of flaky tests in software development and proposes a simple mechanical habit: when using a merge queue, continue to run the full test suite on main and maintain a visible list of recent main failures to help identify and eradicate flaky tests.

<header> <h1>Catch Flakes On Main</h1> <time class="meta" datetime="2026-05-14">May 14, 2026</time> </header> <p>A small <a href="https://matklad.github.io/2025/12/06/mechanical-habits.html"><em>Mechanical Habit</em></a> today:</p> <figure class="blockquote"> <blockquote><p>When using not rocket science rule / merge queue, continue to redundantly run the full test suite on main. Maintain an easily accessible list of recent main failures — these are the flaky tests to eradicate.</p> </blockquote> </figure> <p>For an example, see the “Flakes” link on <a href="https://devhub.tigerbeetle.com" class="display url">https://devhub.tigerbeetle.com</a></p> <figure> <img alt="" src="https://github.com/user-attachments/assets/09dffa9e-cdae-48e2-a4ed-80278da53bb2" width="2310" height="1746"> </figure> <p>Flaky tests are tests that fail intermittently, once in a thousand runs. This might be due to a genuine bug (assumptions about scheduling that <em>mostly</em> hold) or due to instability of underlying infrastructure (e.g., inability to download a release from GitHub, or to delete a folder on Windows). In either case, flaky tests are a huge productivity drain — as the size and complexity of test suite grows, more and more CI runs fail spuriously, even as each individual test almost always passes.</p> <p>Flaky tests are challenging to deal with — if you are working on landing a PR and your CI fails due to an obvious flake, the temptation to just re-run the test suite is enormous, especially if there’s a certain background dissatisfaction with infrastructure stability.</p> <p>If you are of a mind to do some flake squashing, then your PRs will be green just to spite you! And working off of others’ PRs would require first to separate flakes from genuine failures.</p> <p>This is why the merge queue is powerful: if there’s a guarantee that every commit on the main branch passes the tests, then every failure on main is a flake, by definition. Collecting all such failures into a single list compresses time, allows to prioritize the most impactful sources of instability, and reveals correlations between failures.</p>
Original Article
View Cached Full Text

Cached at: 05/16/26, 03:32 AM

# Catch Flakes On Main Source: [https://matklad.github.io/2026/05/14/catch-flakes-on-main.html](https://matklad.github.io/2026/05/14/catch-flakes-on-main.html) May 14, 2026A small[*Mechanical Habit*](https://matklad.github.io/2025/12/06/mechanical-habits.html)today: > When using not rocket science rule / merge queue, continue to redundantly run the full test suite on main\. Maintain an easily accessible list of recent main failures — these are the flaky tests to eradicate\. For an example, see the “Flakes” link on[https://devhub\.tigerbeetle\.com](https://devhub.tigerbeetle.com/) ![](https://github.com/user-attachments/assets/09dffa9e-cdae-48e2-a4ed-80278da53bb2)Flaky tests are tests that fail intermittently, once in a thousand runs\. This might be due to a genuine bug \(assumptions about scheduling that*mostly*hold\) or due to instability of underlying infrastructure \(e\.g\., inability to download a release from GitHub, or to delete a folder on Windows\)\. In either case, flaky tests are a huge productivity drain — as the size and complexity of test suite grows, more and more CI runs fail spuriously, even as each individual test almost always passes\. Flaky tests are challenging to deal with — if you are working on landing a PR and your CI fails due to an obvious flake, the temptation to just re\-run the test suite is enormous, especially if there’s a certain background dissatisfaction with infrastructure stability\. If you are of a mind to do some flake squashing, then your PRs will be green just to spite you\! And working off of others’ PRs would require first to separate flakes from genuine failures\. This is why the merge queue is powerful: if there’s a guarantee that every commit on the main branch passes the tests, then every failure on main is a flake, by definition\. Collecting all such failures into a single list compresses time, allows to prioritize the most impactful sources of instability, and reveals correlations between failures\.

Similar Articles

Choose Boring Technology and Innovative Practices

Hillel Wayne — Computer Things

The article argues that teams should choose boring, well-understood technology for reliability, while being free to innovate in development practices like TCR (test && commit || revert), which are easier to adopt and abandon without long-term maintenance burden.