Llama.cpp B9406 MTP mmproj fix
Summary
Llama.cpp release B9406 fixes a crash (GGML_ASSERT) when using MTP with MoE vision models like Qwen3.6-35B-A3B.
Similar Articles
@victormustar: llama.cpp with MTP support makes local models fast enough to use as daily drivers Qwen3.6-27B dense generation (on A10G…
llama.cpp adds MTP support for Qwen3.6 models, boosting generation speed by 78% on A10G hardware, making local models viable as daily drivers.
llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s
llama.cpp releases version b9495 with optimizations for Qwen3.6/3.5-MTP (Multi-Token Prediction) and requests users to share their benchmark results with full command details.
b9180 llama.ccp MTP landed
llama.cpp version b9180 has been released, featuring Multi-Token Prediction (MTP). The release is marked by successful builds and developer relief.
StepFun 3.5 MTP by pwilkin · Pull Request #23274 · ggml-org/llama.cpp
Pull request adding support for StepFun 3.5 MTP model in llama.cpp.
@ggerganov: llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance j…
llama.cpp adds Multi-Token Prediction (MTP) support for the Qwen3.6 family, delivering massive performance improvements for local AI inference on commodity hardware.