Catch Flakes On Main

matklad 05/14/26, 12:00 AM News

Summary

The article discusses the problem of flaky tests in software development and proposes a simple mechanical habit: when using a merge queue, continue to run the full test suite on main and maintain a visible list of recent main failures to help identify and eradicate flaky tests.

<header> <h1>Catch Flakes On Main</h1> <time class="meta" datetime="2026-05-14">May 14, 2026</time> </header> <p>A small <a href="https://matklad.github.io/2025/12/06/mechanical-habits.html"><em>Mechanical Habit</em></a> today:</p> <figure class="blockquote"> <blockquote><p>When using not rocket science rule / merge queue, continue to redundantly run the full test suite on main. Maintain an easily accessible list of recent main failures — these are the flaky tests to eradicate.</p> </blockquote> </figure> <p>For an example, see the “Flakes” link on <a href="https://devhub.tigerbeetle.com" class="display url">https://devhub.tigerbeetle.com</a></p> <figure> <img alt="" src="https://github.com/user-attachments/assets/09dffa9e-cdae-48e2-a4ed-80278da53bb2" width="2310" height="1746"> </figure> <p>Flaky tests are tests that fail intermittently, once in a thousand runs. This might be due to a genuine bug (assumptions about scheduling that <em>mostly</em> hold) or due to instability of underlying infrastructure (e.g., inability to download a release from GitHub, or to delete a folder on Windows). In either case, flaky tests are a huge productivity drain — as the size and complexity of test suite grows, more and more CI runs fail spuriously, even as each individual test almost always passes.</p> <p>Flaky tests are challenging to deal with — if you are working on landing a PR and your CI fails due to an obvious flake, the temptation to just re-run the test suite is enormous, especially if there’s a certain background dissatisfaction with infrastructure stability.</p> <p>If you are of a mind to do some flake squashing, then your PRs will be green just to spite you! And working off of others’ PRs would require first to separate flakes from genuine failures.</p> <p>This is why the merge queue is powerful: if there’s a guarantee that every commit on the main branch passes the tests, then every failure on main is a flake, by definition. Collecting all such failures into a single list compresses time, allows to prioritize the most impactful sources of instability, and reveals correlations between failures.</p>

Original Article

View Cached Full Text

Cached at: 05/16/26, 03:32 AM

# Catch Flakes On Main Source: [https://matklad.github.io/2026/05/14/catch-flakes-on-main.html](https://matklad.github.io/2026/05/14/catch-flakes-on-main.html) May 14, 2026A small[*Mechanical Habit*](https://matklad.github.io/2025/12/06/mechanical-habits.html)today: > When using not rocket science rule / merge queue, continue to redundantly run the full test suite on main\. Maintain an easily accessible list of recent main failures — these are the flaky tests to eradicate\. For an example, see the “Flakes” link on[https://devhub\.tigerbeetle\.com](https://devhub.tigerbeetle.com/) ![](https://github.com/user-attachments/assets/09dffa9e-cdae-48e2-a4ed-80278da53bb2)Flaky tests are tests that fail intermittently, once in a thousand runs\. This might be due to a genuine bug \(assumptions about scheduling that*mostly*hold\) or due to instability of underlying infrastructure \(e\.g\., inability to download a release from GitHub, or to delete a folder on Windows\)\. In either case, flaky tests are a huge productivity drain — as the size and complexity of test suite grows, more and more CI runs fail spuriously, even as each individual test almost always passes\. Flaky tests are challenging to deal with — if you are working on landing a PR and your CI fails due to an obvious flake, the temptation to just re\-run the test suite is enormous, especially if there’s a certain background dissatisfaction with infrastructure stability\. If you are of a mind to do some flake squashing, then your PRs will be green just to spite you\! And working off of others’ PRs would require first to separate flakes from genuine failures\. This is why the merge queue is powerful: if there’s a guarantee that every commit on the main branch passes the tests, then every failure on main is a flake, by definition\. Collecting all such failures into a single list compresses time, allows to prioritize the most impactful sources of instability, and reveals correlations between failures\.

Catch Flakes On Main

Similar Articles

@ericzakariasson: here are 3 loops you can run in cursor 1. Flaky-test exterminator /loop run my test suite 20 times, collect every inter…

@kettanaito: More and more people are asking me about testing resources so let's put everything I've written in one post. Bookmark, …

@RayFernando1337: The bugs that cause churn almost never show up in a diff, and you only really catch them when you stop reviewing code a…

@jarredsumner: my favorite test failure during bun’s rust rewrite: TOML & YAML parsers stack overflow tests failed because it could no…

Choose Boring Technology and Innovative Practices

Submit Feedback

Similar Articles

@ericzakariasson: here are 3 loops you can run in cursor 1. Flaky-test exterminator /loop run my test suite 20 times, collect every inter…

@kettanaito: More and more people are asking me about testing resources so let's put everything I've written in one post. Bookmark, …

@RayFernando1337: The bugs that cause churn almost never show up in a diff, and you only really catch them when you stop reviewing code a…

@jarredsumner: my favorite test failure during bun’s rust rewrite: TOML & YAML parsers stack overflow tests failed because it could no…

Choose Boring Technology and Innovative Practices