Collected the infinity stones

Reddit r/LocalLLaMA 05/07/26, 10:39 PM News

hardware ai-infra nvidia-blackwell heterogeneous-computing rdma tinygrad

Summary

A user proposes building a heterogeneous AI cluster using Blackwell GPUs and high-memory servers connected via RDMA, seeking collaboration on Tinygrad driver development.

2.3 TB of ram in here. 400+ vCores. All thats left is plugging it to the blackwell with the driver to do RDMA, and it’s over. Using Blackwells for prefill, RDMA to the studio mesh for decode. I think this would be the first heterogeneous cluster. I do, however, need help with the Tinygrad Driver to make this work. If anyone with any knowledge on these domains would like to collaborate, let me know via PM. We are very close here.

Original Article

Similar Articles

@leopardracer: https://x.com/leopardracer/status/2055341758523883631

X AI KOLs Timeline

A user shares their experience setting up a dual-GPU local AI lab with RTX 4080 Super and 5060 Ti, running Qwen 3.6 models via llama.cpp and llama-swap to reduce API costs and enable unrestricted experimentation.

@gippp69: THIS GUY SAW A $430 AI BILL AND BUILT HIS OWN AI LAB UNDER HIS DESK INSTEAD RTX 5090 + RTX 4090, 56GB VRAM, 128GB RAM, …

X AI KOLs Timeline

A user built a private AI lab under his desk using RTX 5090 and RTX 4090 GPUs, running local open-source models like Qwen, DeepSeek, and Llama to avoid API costs.

@andrewchen: finding the main downside with experimenting with local AI models is that you end up buying one GPU, then another, then…

X AI KOLs Following

Andrew Chen shares his experience of buying multiple GPUs for local AI experimentation, running Qwen3.6 27B dense at 100 tok/s on a 5090 eGPU, and compares it to Sonnet 4.6.

@guohao_li: yes, it is definitely time to seriously consider buying more GPUs and start building our own local ai stack. i’m curiou…

X AI KOLs Following

A researcher suggests it's time to buy more GPUs and build a local AI stack, referencing Qwen 3.5 27B and GLM 5.2 as models that cancel the threat of a permanent underclass.

we really all are going to make it, aren't we? 2x3090 setup.

Reddit r/LocalLLaMA

A user shares their experience setting up a dual 3090 GPU system to run the Qwen 3.6 27b model locally, achieving over 100 tokens/second after switching to Ubuntu and using the club-3090 tool with custom patches. They express excitement about the future of local AI.