The author announces a new blog post on clustering three Jetson Nano Orin Supers for distributed training and inference, continuing a series to help people build small compute clusters with accessible hardware.
Hey everyone! Recently, I released a blog on how to setup a cluster out of your Raspberry Pi 4bs and Mac minis for distributed training and inference Now its time to do the same with Jetson Nano Orin Super! Why ? \- 1024 CUDA Cores (Ampere) \- 8GB unified memory LPDDR5 \- 6x ARM Cortex-A78 @ 1728 MHz, 1024-core Ampere GPU @ 1020 MHz This is a part of my current series where I’ll be releasing blogs and guides around learning distributed learning and building your own small compute clusters. The goal is simple: help more people get started with running and training AI models using the hardware they already have lying around. Old laptops, , mini pcs, Jetson Nanos, Raspberry Pis, even phones and tablets. Distributed learning often feels intimidating from the outside, but it’s genuinely one of the coolest areas in systems and AI once you start playing with it yourself. Before we get into the fun stuff like distributed inference and training, the first few posts will focus on setting up hardware properly and building a working cluster environment, basically subtle amount of cabling and networking! The early guides will specifically cover setups around: \- MacBooks and Mac minis (Done!) \- Jetson devices (This one hehe) \- Raspberry Pis (Doneee) After that, we’ll move into quick demos (smolcluster ) , and gradually learn the fundamentals side-by-side while actually running models across devices. I’m building this alongside smolcluster, so a lot of the content will stay very hands-on and practical instead of purely theoretical. Hopefully this helps more people realize that distributed AI systems are not something reserved only for giant datacenters anymore. There is just one question I want to answer: are heterogenous clusters, like what I am trying to make above, even possible for running models? Well, we'll know and till then do read me blog and let me know what you all think! Any comment, feedback etc are very welcome. Hail LocalAI! Ps: For single board benchmark, you can check this [link](https://www.smolhub.com/posts/jetson-nano-super-benchmark-non-reasoning/)
A blog post guides readers through setting up a Raspberry Pi cluster for distributed training and inference, part of a series aimed at making distributed AI accessible using affordable hardware.
A detailed build and benchmarking of a Jetson Orin NX system for running Hermes Agent, achieving 14.65 tok/s at 8k context and 10.21 tok/s at 60k context with Gemma 4 26B quantized model.
NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.
A deep benchmark of 8 tiny LLMs (135M to 1B parameters) on a $250 Jetson Orin Nano Super across four power modes finds 25W to be Pareto-optimal, with SmolLM2-135M achieving 165.1 tok/s and best efficiency.
The article details a setup running six AI agents 24/7 on a Minisforum MS-S1 Max mini workstation with AMD Ryzen AI Max+ 395 chip, costing $11/month in electricity. It highlights the shift from cloud API costs to local inference, enabling always-on agents for tasks like email sorting, research monitoring, and document processing.