Best models in 3x3090 (72GB VRAM) in Q2 2026?

Reddit r/LocalLLaMA News

Summary

A user shares their experience running large LLMs on a 3x3090 (72GB VRAM) setup in Q2 2026, recommending models like GPT-OSS 120b, Qwen3.5 122b, and GLM Air 4.5 106B, and asking for newer alternatives.

Sometime around the beginning of the year I setup my LLM computer — 3x3090 in a very old DDR4 computer, so I only use the 72GB VRAM to load the models (for speed) I’ve been mostly using these three models: - GPT-OSS 120b still pretty sold - Qwen3.5 122b very (very!!) good for one shot coding but extremely over thinking in my opinion - GLM Air 4.5 106B in non-think by default which I use a lot for quick replies Occasionally I also use: - Gemma 4 31B or Qwen3.6 27B as they are quick to load and offload, and sometimes I need to use a video card for other tasks — I keep the LLM in 2x3090 and 1x3090 for audio-image stuff. Because they also fit nicely in 48GB in Q8 I do trust them over the bigger models in some instances. Honorables mentions I stopped using without any valid reason: - Nematron Nano Omni 30B A3B is very good, but I just never use it because I default to the big ones for most general tasks - Devstral Small 2 24B used to be my favorite before Qwen 27B completely replaced it for me as my go-to dev focused LLM, mixed with the big Qwen 122B for “architectural” decision Is there anything newer or better that would fit in 72GB?
Original Article

Similar Articles

Best Settings for 48GB VRAM + Qwen 3.6 27B

Reddit r/LocalLLaMA

A user shares optimized settings for running Qwen3.6 27B (Q8_0) on a dual GPU setup (RTX 4090 + RTX 3090) with llama.cpp, achieving 75-100 t/s and 1500 pp with 250k context.

Qwen 35B-A3B is very usable with 12GB of VRAM

Reddit r/LocalLLaMA

A user benchmarks Qwen 35B-A3B (a 35B MoE model) on a 12GB RTX 3060, finding that 12GB VRAM is a practical sweet spot for running the model with 32k context, achieving ~47 t/s generation.