Tag
Work-in-progress implementation of EAGLE3 speculative decoding for Qwen models in llama.cpp.
EAGLE3, a speculative decoding method, has been integrated into llama.cpp, enabling faster inference.
Next MLX-VLM release includes improvements with a preview of Eagle3 speculative decoding for Gemma 4 models.