Llama.cpp B9406 MTP mmproj fix

Reddit r/LocalLLaMA 05/29/26, 01:14 PM Tools

llama-cpp multi-token-prediction bug-fix open-source vision moe-model

Summary

Llama.cpp release B9406 fixes a crash (GGML_ASSERT) when using MTP with MoE vision models like Qwen3.6-35B-A3B.

[B9406](https://github.com/ggml-org/llama.cpp/releases/tag/b9406) Been waiting for this one. Building now. Report your results if you test! >GGML\_ASSERT(i01 >= 0 && i01 < ne01) crash in get\_rows / mtmd\_helper\_decode\_image\_chunk when using MTP + MoE model + vision (Qwen3.6-35B-A3B)

Original Article

Llama.cpp B9406 MTP mmproj fix

Similar Articles

@victormustar: llama.cpp with MTP support makes local models fast enough to use as daily drivers Qwen3.6-27B dense generation (on A10G…

llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s

b9180 llama.ccp MTP landed

StepFun 3.5 MTP by pwilkin · Pull Request #23274 · ggml-org/llama.cpp

@ggerganov: llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance j…

Submit Feedback

Similar Articles

@victormustar: llama.cpp with MTP support makes local models fast enough to use as daily drivers Qwen3.6-27B dense generation (on A10G…
llama.cpp adds MTP support for Qwen3.6 models, boosting generation speed by 78% on A10G hardware, making local models viable as daily drivers.

llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s

StepFun 3.5 MTP by pwilkin · Pull Request #23274 · ggml-org/llama.cpp

@ggerganov: llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance j…