Tag
A GGUF conversion of MiniMax M3's EAGLE draft model for llama.cpp is now available, enabling speculative decoding speedups on compatible hardware.
EAGLE 3.1 improves speculative decoding robustness with post-norm architecture, achieving up to 2x longer acceptance length in long-context workloads, with training support from TorchSpec and integration into vLLM.
Discussion of different flavors of speculative decoding and an attempt to produce a Qwen-3.6-27b EAGLE-3 drafter for the community.