@no_stp_on_snek: Tested out MTP for the first time on my llamacpp fork last night with turbo4 sym. GX10 hardware. using MoE model: llmfa…

X AI KOLs Following Models

Summary

Tested Multi-Token Prediction on a llamacpp fork with a Qwen-based MoE model, achieving +0.41% PPL improvement over fp16 baseline.

Tested out MTP for the first time on my llamacpp fork last night with turbo4 sym. GX10 hardware. using MoE model: llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved +0.41 % PPL vs fp16 baseline https://t.co/pwzhfphHCK
Original Article
View Cached Full Text

Cached at: 05/23/26, 08:01 AM

Tested out MTP for the first time on my llamacpp fork last night with turbo4 sym.

GX10 hardware.

using MoE model: llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved

+0.41 % PPL vs fp16 baseline https://t.co/pwzhfphHCK

Similar Articles

Testing llama.cpp MTP support on Qwen3.6 - RTX 5090

Reddit r/LocalLLaMA

A technical test of llama.cpp's new Multi-Token Prediction (MTP) support using Qwen3.6 models on an RTX 5090, comparing performance with and without MTP across different prompts and GGUF quantizations.