Cached at:
07/04/26, 06:45 PM
TL;DR: A hardware enthusiast built an immersion-cooled super cluster using 8192 RISC-V microcontrollers (CH570), with a goal of ultimately reaching 65536 cores, overcoming clock, SPI signal, and production bottlenecks along the way.
## From Crazy Idea to Practical Planning
The video author previously built an M.2 cluster, receiving sponsorship and budget support from Altium, and decided to build a cluster of an entirely different scale. Inspiration came from WCH's CH570 microcontroller: 13 cents each, running at 100MHz, with 12KB SRAM, a multiplier, native USB, and radio. The author initially wanted each pixel to correspond to one MCU — that is, 64×64 = 4096 pixels for QVGA resolution, requiring 65536 MCUs. 65536 × 13 cents ≈ $8,500, and with WCH CTO Patrick providing 10,000 MCUs for free, the plan seemed feasible.
But the number 65536 brings a serious problem: power consumption. Each MCU consumes about 10mA @ 3.3V, totaling 650A, 2kW. European wall sockets max out around 3kW, with 99.9% of the electricity turning into heat. The author decided to use immersion cooling and a half-meter diameter acrylic container to hold the "blade" boards.
## Blade Design: 1K Core Module
Each blade contains 32 rows × 32 columns = 1024 small MCUs, plus 32 row controllers (larger cores with FPU). The author chose SPI as the bus, with 32 MCUs per SPI line, each row controller providing 32 independent chip select pins. For signal integrity, a six-layer PCB was used: layers 2 and 5 are continuous ground planes, with signal lines sandwiched between inner layers and isolated by ground traces. Power distribution uses the bottom layer (thicker copper) to supply power to each column's LEDs and MCUs via copper areas.
Each MCU is also equipped with a 1×1mm common-anode RGB LED (red, green, blue pins, MCU providing ground). The clock source was initially intended to skip the crystal, using the row controller's PWM to provide a 32MHz clock. However, the clock signal was found to be severely distorted — because the CH570's clock pin has a maximum rated voltage of only 1.4V, and there is no bypass mode option. Forced to add a crystal package next to each MCU, requiring a redesign of the test board.
## Mistakes and Fixes
### Clock and Crystal
During initial testing, some MCUs refused to start, but touching the MCU would start them. Inspection revealed the external clock signal was being chopped to pieces, and the CH570's clock pin could not tolerate the voltage. Forced to switch to an independent crystal for each MCU, redesigning the PCB.
### SPI Communication Errors
During the first full blade test, SPI communication between layer 0 and layer 1 failed. Using an oscilloscope, the Mosi signal showed voltage halving after chip select activation — because there were 50Ω resistors between MCUs, and two MCUs were driving the bus simultaneously. It was eventually discovered that Mosi and Miso were connected reversed. Since there were series resistors on the lines, a patch wire fix was possible. After fixing, SPI operated stably at about 15Mbps, with occasional errors at 7.5Mbps.
## Programming Automation
To program thousands of MCUs, the author used a Bumbo printer (no USB, controlled via Home Assistant) converted into a gantry programming machine. The Y-axis moved a spring-loaded pin arm to contact the programming pads on the back, and a Python script called OpenOCD as a subprocess to upload firmware one by one, scanning output to confirm success.
## Backplane and Final Assembly
The original design was for 64 blades (65536 cores), but the manufacturer could not handle over 10,000 vias and 5,000 components per board. The design was split into two blades placed side by side. Then a backplane to connect 16 blades was made (the 128-blade version to be expanded later). The backplane used XT60 connectors for power, drawing from a Corsair 3kW PC power supply, and stepping down to 3.3V via Murata DC-DC converters (95% efficiency, 30A capability). The author made 20 such converters.
Finally, 16 blades (the initial version with 16,284 cores) were assembled, using 0.1mm line width/spacing manufacturing. A small mistake: the programming pads were placed under a fourth-level MCU, so a patch wire was routed from the QFN bottom exposed pad.
## Advantages of Altium Tools
The author emphasized several key features of Altium Designer:
- **Multi-channel design**: using the `repeat` keyword to avoid manually copying sub-sheets.
- **Real-time DRC rule checking**: automatically highlights rule violations.
- **Automatic impedance matching**: calculates trace width automatically after setting the stackup.
- **Polar grid**, **maximum rated current checking**, **filtering by attributes**, etc.
These tools were critical for managing such a complex project (thousands of components per board).
---
Source: YouTube video DIY RISC-V ultracluster (https://youtube.com/watch?v=qMR3IXF2sWw)