3090

Tag

Cards List
#3090

Best models in 3x3090 (72GB VRAM) in Q2 2026?

Reddit r/LocalLLaMA · 2026-06-13

A user shares their experience running large LLMs on a 3x3090 (72GB VRAM) setup in Q2 2026, recommending models like GPT-OSS 120b, Qwen3.5 122b, and GLM Air 4.5 106B, and asking for newer alternatives.

0 favorites 0 likes
#3090

Weird to get near linear scaling by adding another GPU?

Reddit r/LocalLLaMA · 2026-06-08

A user reports near-linear performance scaling when adding a second RTX 3090 for inference with a Qwen model, achieving roughly 1.8x decode TPS improvement without NVLink.

0 favorites 0 likes
#3090

Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.

Reddit r/artificial · 2026-06-02

A user shares impressive results running a quantized Qwen 3.6:35b-a3b model on a used RTX 3090, achieving 160 tokens per second output after fitting the model into VRAM, and demonstrates vision capabilities with a 75-second video processing time.

0 favorites 0 likes
#3090

@malikwas1f: well well well, Beellama managed to merge Dflash+TurboQuant already. this unlocks Q5 quants. Things just keep getting b…

X AI KOLs Timeline · 2026-05-24 Cached

A GitHub repository called club-3090 provides recipes and configs for serving large language models locally on RTX 3090 GPUs, with support for multiple engines and quantization methods like Dflash and TurboQuant, including newly unlocked Q5 quants.

0 favorites 0 likes
#3090

Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

Reddit r/LocalLLaMA · 2026-05-16

Discussion of performance tradeoffs when using the new MTP merge in llama.cpp to run Qwen 3.6 35B on dual 3090s, with users sharing token speeds and seeking optimal configurations.

0 favorites 0 likes
← Back to home

Submit Feedback