@danielhanchen: DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%!…
Summary
DeepSeek released DSpark, a speculative decoding method that boosts throughput by 51% to 400% for V4 Flash & Pro, along with the open-source DeepSpec codebase for training and evaluating draft models.
View Cached Full Text
Cached at: 06/27/26, 07:51 AM
DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%! DS also showed DSpark works well for other models like Gemma & Qwen Github: https://github.com/deepseek-ai/DeepSpec… Paper: https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf… HF: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark…
deepseek-ai/DeepSpec
Source: https://github.com/deepseek-ai/DeepSpec
DeepSpec
DeepSpec is a full-stack codebase for training and evaluating draft models for speculative decoding. It contains data preparation utilities, draft model implementations, training code, and evaluation scripts.
Environment
Install the Python dependencies:
python -m pip install -r requirements.txt
Data preparation additionally requires an inference engine to serve the target model when regenerating answers; see scripts/data/README.md for details.
Workflow
Run the stages in order — each stage’s output feeds the next:
- Data Preparation — download prompts, regenerate target answers, and build the target cache.
- Training — train a draft model against the cached target outputs.
- Evaluation — measure speculative-decoding acceptance on benchmark tasks.
Data Preparation
See scripts/data/README.md for the step-by-step data pipeline:
- download and split training data,
- regenerate answers,
- prepare the target cache (storage warning: this can be very large — roughly 38 TB for the default
Qwen/Qwen3-4Bsetting).
Training
bash scripts/train/train.sh
train.sh launches train.py, which spawns one worker per visible GPU. Select the algorithm and target model by pointing config_path at one of the configs under config/ (e.g. config/dspark/dspark_qwen3_4b.py); see the script header for the full list of configs, how to override config_path / target_cache_dir, and how to use --opts to override individual config fields. Checkpoints are written to ~/checkpoints/<project_name>/<exp_name>/step_*.
Hardware: the default configs and scripts assume a single node with 8 GPUs. For fewer GPUs, reduce CUDA_VISIBLE_DEVICES.
Evaluation
bash scripts/eval/eval.sh
eval.sh runs eval.py against a trained draft checkpoint over the speculative-decoding benchmarks in eval_datasets/ (gsm8k, math500, aime25, humaneval, mbpp, livecodebench, mt-bench, alpaca, arena-hard-v2). Set:
target_name_or_path— the target model the draft was trained against (e.g.Qwen/Qwen3-4B),draft_name_or_path— the draft checkpoint, e.g.~/checkpoints/deepspec/dspark_block8_qwen3_4b/step_latest.
Supported Algorithms
Currently, DeepSpec includes three draft models: DSpark, DFlash and Eagle3.
License
DeepSpec is released under the MIT License. It includes code adapted from third-party projects under their own licenses; see NOTICE for the full attribution.
Acknowledgements
DeepSpec builds on the ideas and code of several excellent open-source projects:
- SpecForge (Apache-2.0) — the overall training framework and Eagle3 implementation; portions of the Eagle3 modeling, loss, optimizer, attention, and evaluation code are adapted from it. Adapted files carry an in-file attribution comment, and the full notice is recorded in NOTICE.
- DFlash (MIT) — the DFlash draft-model design and training recipe.
- Qwen3 and Gemma — the target model families supported in this repo.
We thank the authors and maintainers of these projects. Contributions of new algorithms are welcome.
Similar Articles
DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85% (18 minute read)
DeepSeek open-sourced DSpark, an MIT-licensed framework using speculative decoding to accelerate LLM inference by up to 85%, with support for multiple model families including its own DeepSeek-V4, Alibaba's Qwen, and Google's Gemma.
@dzhulgakov: DSpark from @deepseek_ai ingeniously integrates many speculative decoding ideas to achieve 1.5x to 5x higher throughput…
DSpark from DeepSeek AI integrates speculative decoding ideas to achieve 1.5x to 5x higher throughput in production systems. This thread explains 10 key ideas from the basics.
DeepSeek open-sources inference optimizations with 60–85% faster generation [pdf]
DeepSeek open-sourced DeepSpec, a full-stack codebase for training and evaluating draft models for speculative decoding, enabling 60-85% faster generation. It includes data preparation, training, and evaluation scripts with support for multiple draft model algorithms (DSpark, DFlash, Eagle3).
deepseek-ai/DeepSeek-V4-Flash-DSpark
DeepSeek releases V4 series of Mixture-of-Experts language models (Pro 1.6T/49B activated, Flash 284B/13B activated) supporting one-million-token context with hybrid attention and speculative decoding, claiming best open-source model performance.
@DeRonin_: DeepSeek just dropped a 5-page paper + free GitHub repo that makes any LLM respond 80% faster it's called speculative d…
DeepSeek released a paper and MIT-licensed open-source implementation of speculative decoding (DSpark) that speeds up LLM responses by up to 80% by using a small 'guess' model and a large 'check' model, achieving both speed and accuracy without tradeoffs.