Are the rich RAM /poor GPU people wrong here?

Reddit r/LocalLLaMA 05/15/26, 02:43 PM News

local-ai hardware moe dense-models gpu-memory ram model-offload

Summary

Discusses the trade-off between dense and Mixture-of-Experts (MoE) models for local AI, noting that high-RAM users have limited MoE options beyond Qwen 3.5 122B, and questioning if large GPU is the only viable path.

Hello Guys, I know everyone has his definition of local models, but for me i see 2 "reasonable" type of frontier local models. a dense one that barely fit in a 32GB ou 24GB of gpu for the most "reasonable" GPU wealthy guys and a MOE in the 100B params, the 100ish B billion params can be run on hybrid offload with a decent speed on a 128GB ram, since 128GB is the max a standard motherboard can support. Again it's cheap but common people can still afford it, it's still cheaper than a car 😄 . We see a lot of limit dense models, like qwen 27B, but for for the 100 MOE type there was only the Qwen 3.5 122B, they didn't even release the 3.6. the best MOE models range in the 30-35B. does it mean that for rich ram and poor GPU people we don't have much choice, and the big GPU was the only good road? Of course you can cram minimaxi like with Q3 or deepseek V3 in Q1. but for tool calling , speed and real usage it's barely usable. I bought a strix halo before the ram-pocalypse, but i see very few use case for the 128GB exept being able to load multiple models that can be done with llama swap

Original Article

Are the rich RAM /poor GPU people wrong here?

Similar Articles

What is the point of MoE models, beyond being faster?

Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM

High VRAM local coding model — still Qwen 3.6 27B?

Performance When Offloading Large Models to System RAM?

@andrewchen: finding the main downside with experimenting with local AI models is that you end up buying one GPU, then another, then…

Submit Feedback

Similar Articles

What is the point of MoE models, beyond being faster?

Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM

High VRAM local coding model — still Qwen 3.6 27B?

Performance When Offloading Large Models to System RAM?

@andrewchen: finding the main downside with experimenting with local AI models is that you end up buying one GPU, then another, then…