LM Studio finally added support for MTP Speculative Decoding

Reddit r/LocalLLaMA 05/20/26, 03:10 AM Tools

lm-studio speculative-decoding mtp local-llm inference update

Summary

LM Studio has added support for MTP speculative decoding in its latest beta update, improving inference speed for local LLMs.

https://preview.redd.it/1uuzjm0ll72h1.png?width=923&format=png&auto=webp&s=1af7d7594be1e08ff7ad6797e2bc53e9410769a3 update to 0.4.14 Build 2 (Beta) and make sure your llama.cpp engine is 2.15.0 https://preview.redd.it/x0vdwjb3n72h1.png?width=742&format=png&auto=webp&s=6367de44208004d2f50194d78a542c46b040dceb you also must select "Manually choose model load parameters" and enable MTP in those before loading the model it is NOT on by default

Original Article

Similar Articles

@lmstudio: MTP is available in LM Studio 0.4.14. Sound on.

X AI KOLs Timeline

LM Studio 0.4.14 introduces MTP (Multi-Turn Prompt) support, enhancing its local AI model capabilities.

llama + spec: MTP Support by am17an · Pull Request #22673 · ggml-org/llama.cpp

Reddit r/LocalLLaMA

Pull request adding Multi-Token Prediction (MTP) support to llama.cpp, enabling speculative decoding for faster inference.

@_avichawla: Researchers found a way to make LLMs 8.5x faster! (without compromising accuracy) Speculative decoding is quite an effe…

X AI KOLs Timeline

Researchers introduced DFlash, a technique using block diffusion models for speculative decoding that accelerates LLM inference by up to 8.5x without accuracy loss. It is already integrated with major frameworks like vLLM and SGLang.

Made an interactive explainer about speculative decoding/MTP

Reddit r/LocalLLaMA

An interactive guide explaining speculative decoding and multi-token prediction in LLMs, covering techniques from rejection sampling to MTP used in Qwen 3.6 and Gemma 4, with live diagrams and sliders.

Latest LM Studio update killed MTP performance

Reddit r/LocalLLaMA

A user reports that the latest LM Studio update (0.4.17) eliminated the multi-token prediction speed boost, reverting to previous performance on an RTX 5090 setup.

Similar Articles

@lmstudio: MTP is available in LM Studio 0.4.14. Sound on.

llama + spec: MTP Support by am17an · Pull Request #22673 · ggml-org/llama.cpp

@_avichawla: Researchers found a way to make LLMs 8.5x faster! (without compromising accuracy) Speculative decoding is quite an effe…

Made an interactive explainer about speculative decoding/MTP

Latest LM Studio update killed MTP performance

Submit Feedback