B9109: preemptive fix for mtp & mmproj fix soon? It appears so

Reddit r/LocalLLaMA Tools

Summary

Upcoming updates address crashes between multimodal projection and multi-token prediction by enabling image processing through draft contexts. The changes also introduce parallel draft support to improve speculative decoding scalability.

Summary : spec : process images through the draft context — this directly addresses the mmproj + MTP crash. Previously images (mmproj) couldn't be processed through the speculative/draft context at all. This commit adds that capability. That's the actual fix in progress. server : fix mtmd draft processing — mtmd is the multimodal (mmproj) handler. Explicitly fixing draft processing for multimodal means they know about the crash and are targeting it. spec : support parallel drafts — this is infrastructure for running multiple draft models simultaneously, which is required for MTP to work properly at scale with parallel slots. The combination of all three in one build — multimodal draft fix, parallel draft support, and images through draft context — suggests this is a focused push to get MTP + mmproj working together. PR #22673 might not be far behind.
Original Article

Similar Articles

b9200 released - potential mtp pp increase

Reddit r/LocalLLaMA

llama.cpp release b9200 improves prompt processing speed for Multi-Token Prediction by avoiding unnecessary logits copying, reducing memory traffic.

Llama.cpp B9406 MTP mmproj fix

Reddit r/LocalLLaMA

Llama.cpp release B9406 fixes a crash (GGML_ASSERT) when using MTP with MoE vision models like Qwen3.6-35B-A3B.

b9180 llama.ccp MTP landed

Reddit r/LocalLLaMA

llama.cpp version b9180 has been released, featuring Multi-Token Prediction (MTP). The release is marked by successful builds and developer relief.