b9180 llama.ccp MTP landed
Summary
llama.cpp version b9180 has been released, featuring Multi-Token Prediction (MTP). The release is marked by successful builds and developer relief.
Similar Articles
b9200 released - potential mtp pp increase
llama.cpp release b9200 improves prompt processing speed for Multi-Token Prediction by avoiding unnecessary logits copying, reducing memory traffic.
MTP support merged into llama.cpp
The pull request adding MTP (Multi-Token Prediction) support to llama.cpp has been merged into the master branch.
That's a good news...
Multi-token prediction (MTP) has been approved for integration into llama.cpp, indicating an upcoming update to the local LLM inference tool.
llama + spec: MTP Support by am17an · Pull Request #22673 · ggml-org/llama.cpp
Pull request adding Multi-Token Prediction (MTP) support to llama.cpp, enabling speculative decoding for faster inference.
@ivanfioravanti: llamacpp is gonna get MTP support soon!
llamacpp will soon support Multi-Token Prediction (MTP), enhancing inference efficiency.