MiniMax-M3-EAGLE3-GGUF - Llama.cpp compatible MiniMax M3 EAGLE draft model!
Summary
A GGUF conversion of MiniMax M3's EAGLE draft model for llama.cpp is now available, enabling speculative decoding speedups on compatible hardware.
Similar Articles
EAGLE3 has landed in llama.cpp
EAGLE3, a speculative decoding method, has been integrated into llama.cpp, enabling faster inference.
unsloth/MiniMax-M3-GGUF
Unsloth releases a GGUF quantized version of the MiniMax-M3 multimodal model, enabling image-text-to-text tasks with support for Transformers, llama.cpp, vLLM, and other inference engines.
unsloth/North-Mini-Code-1.0-GGUF · Hugging Face
This page hosts GGUF quantized versions of Cohere's North-Mini-Code-1.0 model, a 30B-A3B MoE model optimized for code generation and agentic tasks. Instructions are provided for building llama.cpp from a specific PR to support the cohere2moe architecture.
Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40%
A new implementation of Multi-Token Prediction (MTP) in llama.cpp achieves a 40% speedup for Gemma 4 models, tested on a MacBook Pro M5Max. The post provides links to quantized GGUF models and the patched source code.
Unsloth Minimax M3 GGUF
Unsloth is uploading a GGUF quantized version of the MiniMax M3 model to Hugging Face.