Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

Reddit r/LocalLLaMA 05/16/26, 10:19 PM Tools

llama-cpp mtp qwen model-inference 3090 gpu performance

Summary

Discussion of performance tradeoffs when using the new MTP merge in llama.cpp to run Qwen 3.6 35B on dual 3090s, with users sharing token speeds and seeking optimal configurations.

We've got great outputs for 27B via club 3090, but what about those of us who love the blazing speed of 35B on dual 3090s? I was getting 1500 p/p and 120 t/g with split layers, but MTP slowed it down to 80 t/g when I tested last week. I'm sticking with my CPU overflow fallback of 3500 p/p and 80 t/g until someone cooks up something ala the geniuses over at club 3090. What have you tried so far with the new llama.cpp MTP merge? Any big jump over your previous best build for 35B?

Original Article

Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

Similar Articles

More Qwen3.6-27B MTP success but on dual Mi50s

@Snixtp: https://x.com/Snixtp/status/2055734339346768225

Qwen 3.5 122B MoE OC on a single 3090 at 35 t/s — full local stack breakdown

Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK

Testing llama.cpp MTP support on Qwen3.6 - RTX 5090

Submit Feedback

Similar Articles

More Qwen3.6-27B MTP success but on dual Mi50s

@Snixtp: https://x.com/Snixtp/status/2055734339346768225

Qwen 3.5 122B MoE OC on a single 3090 at 35 t/s — full local stack breakdown

Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK

Testing llama.cpp MTP support on Qwen3.6 - RTX 5090