LM Studio finally added support for MTP Speculative Decoding

Reddit r/LocalLLaMA Tools

Summary

LM Studio has added support for MTP speculative decoding in its latest beta update, improving inference speed for local LLMs.

https://preview.redd.it/1uuzjm0ll72h1.png?width=923&format=png&auto=webp&s=1af7d7594be1e08ff7ad6797e2bc53e9410769a3 update to 0.4.14 Build 2 (Beta) and make sure your llama.cpp engine is 2.15.0 https://preview.redd.it/x0vdwjb3n72h1.png?width=742&format=png&auto=webp&s=6367de44208004d2f50194d78a542c46b040dceb you also must select "Manually choose model load parameters" and enable MTP in those before loading the model it is NOT on by default
Original Article

Similar Articles

Latest LM Studio update killed MTP performance

Reddit r/LocalLLaMA

A user reports that the latest LM Studio update (0.4.17) eliminated the multi-token prediction speed boost, reverting to previous performance on an RTX 5090 setup.