draft-model

#draft-model

DeepSeek open-sources inference optimizations with 60–85% faster generation [pdf]

Hacker News Top ↗ · 3d ago Cached

DeepSeek open-sourced DeepSpec, a full-stack codebase for training and evaluating draft models for speculative decoding, enabling 60-85% faster generation. It includes data preparation, training, and evaluation scripts with support for multiple draft model algorithms (DSpark, DFlash, Eagle3).

0 favorites 0 likes

#draft-model

MiniMax-M3-EAGLE3-GGUF - Llama.cpp compatible MiniMax M3 EAGLE draft model!

Reddit r/LocalLLaMA ↗ · 2026-06-23

A GGUF conversion of MiniMax M3's EAGLE draft model for llama.cpp is now available, enabling speculative decoding speedups on compatible hardware.

0 favorites 0 likes

#draft-model

What is Speculative Decoding? (trending on paperswithco.de) [R]

Reddit r/MachineLearning ↗ · 2026-06-17

Speculative decoding is an inference optimization technique that uses a fast draft model to propose future tokens verified in parallel by a larger model, improving LLM generation speed. The article highlights its trending status on Papers with Code and a recent SGLang blog post about state-of-the-art latencies using DFlash models.

0 favorites 0 likes

#draft-model

@Ex0byt: the different flavors of specdec, and why I'm trying produce a Qwen-3.6-27b EAGLE-3 drafter for ya'll

X AI KOLs Timeline ↗ · 2026-05-17 Cached

Discussion of different flavors of speculative decoding and an attempt to produce a Qwen-3.6-27b EAGLE-3 drafter for the community.

0 favorites 0 likes

#draft-model

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

arXiv cs.CL ↗ · 2026-05-15 Cached

Proposes PPOW, a reinforcement learning framework for optimizing draft models in speculative decoding using window-level objectives and adaptive windowing, achieving significant speedups across multiple benchmarks.

0 favorites 0 likes

#draft-model

SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding

Hugging Face Daily Papers ↗ · 2026-05-11 Cached

SlimSpec introduces a low-rank parameterization for drafter LM-heads to accelerate speculative decoding in LLMs, achieving 4-5x speedup while maintaining full vocabulary support.

0 favorites 0 likes

draft-model

DeepSeek open-sources inference optimizations with 60–85% faster generation [pdf]

MiniMax-M3-EAGLE3-GGUF - Llama.cpp compatible MiniMax M3 EAGLE draft model!

What is Speculative Decoding? (trending on paperswithco.de) [R]

@Ex0byt: the different flavors of specdec, and why I'm trying produce a Qwen-3.6-27b EAGLE-3 drafter for ya'll

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding

Submit Feedback