B9109: preemptive fix for mtp & mmproj fix soon? It appears so
Summary
Upcoming updates address crashes between multimodal projection and multi-token prediction by enabling image processing through draft contexts. The changes also introduce parallel draft support to improve speculative decoding scalability.
Similar Articles
b9200 released - potential mtp pp increase
llama.cpp release b9200 improves prompt processing speed for Multi-Token Prediction by avoiding unnecessary logits copying, reducing memory traffic.
Llama.cpp B9406 MTP mmproj fix
Llama.cpp release B9406 fixes a crash (GGML_ASSERT) when using MTP with MoE vision models like Qwen3.6-35B-A3B.
b9180 llama.ccp MTP landed
llama.cpp version b9180 has been released, featuring Multi-Token Prediction (MTP). The release is marked by successful builds and developer relief.
@ivanfioravanti: llamacpp is gonna get MTP support soon!
llamacpp will soon support Multi-Token Prediction (MTP), enhancing inference efficiency.
LM Studio finally added support for MTP Speculative Decoding
LM Studio has added support for MTP speculative decoding in its latest beta update, improving inference speed for local LLMs.