Best models in 3x3090 (72GB VRAM) in Q2 2026?

Reddit r/LocalLLaMA 06/13/26, 08:07 PM News

llm hardware 3090 vram model-comparison qwen glm

Summary

A user shares their experience running large LLMs on a 3x3090 (72GB VRAM) setup in Q2 2026, recommending models like GPT-OSS 120b, Qwen3.5 122b, and GLM Air 4.5 106B, and asking for newer alternatives.

Sometime around the beginning of the year I setup my LLM computer — 3x3090 in a very old DDR4 computer, so I only use the 72GB VRAM to load the models (for speed) I’ve been mostly using these three models: - GPT-OSS 120b still pretty sold - Qwen3.5 122b very (very!!) good for one shot coding but extremely over thinking in my opinion - GLM Air 4.5 106B in non-think by default which I use a lot for quick replies Occasionally I also use: - Gemma 4 31B or Qwen3.6 27B as they are quick to load and offload, and sometimes I need to use a video card for other tasks — I keep the LLM in 2x3090 and 1x3090 for audio-image stuff. Because they also fit nicely in 48GB in Q8 I do trust them over the bigger models in some instances. Honorables mentions I stopped using without any valid reason: - Nematron Nano Omni 30B A3B is very good, but I just never use it because I default to the big ones for most general tasks - Devstral Small 2 24B used to be my favorite before Qwen 27B completely replaced it for me as my go-to dev focused LLM, mixed with the big Qwen 122B for “architectural” decision Is there anything newer or better that would fit in 72GB?

Original Article

Best models in 3x3090 (72GB VRAM) in Q2 2026?

Similar Articles

Best Settings for 48GB VRAM + Qwen 3.6 27B

High VRAM local coding model — still Qwen 3.6 27B?

Qwen 35B-A3B is very usable with 12GB of VRAM

Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules

2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache

Submit Feedback

Similar Articles

Best Settings for 48GB VRAM + Qwen 3.6 27B

High VRAM local coding model — still Qwen 3.6 27B?

Qwen 35B-A3B is very usable with 12GB of VRAM

Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules

2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache