Golden Testing a CAD Library

Hacker News Top 05/11/26, 07:12 PM Tools

golden-testing visual-regression-testing cad-library haskell testing waterfall-cad svg

Summary

The author describes implementing golden/visual regression testing for the Waterfall-CAD Haskell library using SVG output and the tasty-golden library.

No content available

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/14/26, 06:21 AM

# Golden Testing a CAD Library Source: [https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html) As[I’ve](https://doscienceto.it/blog/posts/2024-01-23-ffi.markdown)[written](https://doscienceto.it/blog/posts/2024-06-30-things-ive-3d-printed-in-haskell.md)[about](https://doscienceto.it/blog/posts/2024-09-15-chess-set.markdown)[before](https://doscienceto.it/blog/posts/2025-04-14-waterfall-cad-svg.md), I’m the author/maintainer of[a Haskell library for programmable CAD, called Waterfall\-CAD](https://hackage.haskell.org/package/waterfall-cad)\. Ever since I released this in 2023, it’s bothered me that I don’t really have tests for it\. Testing a CAD library like Waterfall\-CAD is difficult, because the outputs of a Waterfall\-CAD program are generally 3D models, and are difficult things to write good test assertions about\. In 2025, I added[SVG support](https://hackage.haskell.org/package/waterfall-cad-svg)to Waterfall\-CAD, converting[the images in the README\.md](https://github.com/joe-warren/opencascade-hs#examples)from screenshots of a mesh viewer, to vector diagrams generated directly within Haskell code\. While testing solid models seems inherently tricky, there’s an established field of “Visual Regression Testing” tools, so having SVG output seemed vastly more testable than 3d model output[1](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fn1)\. ## [Visual Regression Testing](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#visual-regression-testing) “Visual Regression Testing” is mostly used when testing UI code\. “Visual Regression Tools” work by storing a snapshot of an image generated by an application[2](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fn2), generating a visual diff of the current behaviour compared to that snapshot, and failing the test if the current behaviour looks significantly different\. They also usually provide a mechanism to “accept” the current behaviour; overwriting the snapshots with new outputs, as well as a mechanism to visualize the difference between the expected and actual behaviour, highlighting the parts of the snapshot that have changed\. ### [A Note on Terminology](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#a-note-on-terminology) “Golden Testing”[3](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fn3)is the name of a testing technique where the expected output of a program is stored in a file, and the program is tested by comparing the current output to that file\. “Golden Testing” tools also generally provide a mechanism to “accept” the current program behaviour, and “overwrite” the files with the current behaviour\. It seems useful to me to treat “Visual Regression Testing” as a special case of “Golden Testing” where the test files are images, and the “diff” is a visual diff, rather than a precise comparison of the binary structure of the files\. Going into this project, I thought that the term for the kind of test that I wanted to write was “Snapshot Testing”\. I’m now pretty sure that’s incorrect, “Snapshot Testing” seems to be used to describe exactly the same technique as “Golden Testing” \(although the different terms are common in different programming language communities\)\. I’m currently using the term “Visual Regression Testing” for what I’m doing; I think this is synonymous with the \(also commonly used\) term “Visual Snapshot Testing”\. I still think it’s useful to think of this as a special case of “Golden Testing” or “Snapshot Testing”\. Indeed, I’m using the Haskell Golden Testing library[tasty\-golden](https://hackage.haskell.org/package/tasty-golden)[4](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fn4)to implement these tests\. ## [What I’m Doing Specifically](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#what-im-doing-specifically) I’ve added a series of tests based on the diagrams that had already been generated for the README\. The images are stored as SVG, with[some custom css](https://github.com/joe-warren/opencascade-hs/blob/f88a23e3d7196778e118a4c0916fedcd0c9b54fa/waterfall-cad-examples/src/DarkModeSVG.hs)so they render differently according to whether the README’s being viewed in dark mode or not\. I like having the images be dual purpose, because it’s one less thing to keep in sync\. The[Rasterific\-SVG](https://hackage.haskell.org/package/rasterific-svg)library is used to convert both the SVG in the repo, and the one generated by the code under test, into a[JuicyPixels image](https://hackage.haskell.org/package/JuicyPixels-3.3.9/docs/Codec-Picture.html#t:Image)\. Once we’ve got two images, the code asserts that they both have the same size\. If they do, it compares each pixel in the images, and counts the number of pixels that differ by a[Manhattan distance](https://en.wikipedia.org/wiki/Taxicab_geometry)of more than a certain amount\. If this count of mismatched pixels is higher than some tolerance value, the test fails\. On test failure, a “diff image” is written, which highlights the pixels that differed\. You can see the code doing all this[in the`DiagramGoldenTests\.hs`file in the GitHub repo](https://github.com/joe-warren/opencascade-hs/blob/eebecad0ec7cee184bdd03f0f2fe7dbe19145024/waterfall-cad-examples/test/DiagramGoldenTests.hs#L4)\. ![](https://doscienceto.it/blog/images/csg.diff.png) I made this image by deliberately failing the tests by changing the colour used for the hidden lines\. ## [Is There an Open\-Source Library In This?](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#is-there-an-open-source-library-in-this) I think there*might*be\. But at the moment, I’m a small distance away from having written it\. A proper library would have to support more of JuicyPixels’[Pixel formats](https://hackage.haskell.org/package/JuicyPixels-3.3.9/docs/Codec-Picture.html#i:Pixel)\. It would need to make the diff visualization more customizable \(you don’t want to show differences in red if the image already contains a whole lot of red\)\. It would also need to support loading images directly from disk, instead of the “via\-svg” juggling that I’m doing\. At the moment, I’m leaning towards*not doing any of that*, but I’m torn, and if one person says “yeah, I’d use that”, there’s a good chance I can be persuaded\. --- 1. Like “knowing you should eat vegetables”, aspiring to write tests is much easier than actually writing them, hence the nearly a year gap between adding SVG support and testing it\.[↩︎](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fnref1) 2. Usually a screenshot of an application in the case of testing a User Interface; although not in this specific case\.[↩︎](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fnref2) 3. or[“Characterization Testing”](https://en.wikipedia.org/wiki/Characterization_test), or “Golden Record Testing”, or “Golden Master Testing”[↩︎](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fnref3) 4. [`tasty\-silver`](https://hackage.haskell.org/package/tasty-silver)is a relatively commonly used fork of`tasty\-golden`; I thought this was also worth mentioning\.[↩︎](https://doscienceto.it/blog/posts/2026-04-27-golden-testing-cad.html#fnref4)

Golden Testing a CAD Library

Similar Articles

Show HN: CADara – I made an open-source in-browser CAD

Models and Quants quality test results - the chessboard svg (Qwen3.6 27B/35B-A3B/Zaya1)

@ruben_kostard: As a stress test for both the model and @ForgeCAD, asked GPT-5.5-Pro to make a helicopter. Clearly still there is work …

256 Lines or Less: Test Case Minimization

Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5, 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash - v2

Submit Feedback

Similar Articles

Show HN: CADara – I made an open-source in-browser CAD

Models and Quants quality test results - the chessboard svg (Qwen3.6 27B/35B-A3B/Zaya1)

@ruben_kostard: As a stress test for both the model and @ForgeCAD, asked GPT-5.5-Pro to make a helicopter. Clearly still there is work …

256 Lines or Less: Test Case Minimization

Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5, 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash - v2