@danielhanchen: DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%!…

X AI KOLs Timeline 06/27/26, 06:10 AM Papers

speculative-decoding deepseek dspark throughput-optimization open-source large-language-models

Summary

DeepSeek released DSpark, a speculative decoding method that boosts throughput by 51% to 400% for V4 Flash & Pro, along with the open-source DeepSpec codebase for training and evaluating draft models.

DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%! DS also showed DSpark works well for other models like Gemma & Qwen Github: https://github.com/deepseek-ai/DeepSpec… Paper: https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf… HF: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark…

Original Article

View Cached Full Text

Cached at: 06/27/26, 07:51 AM

deepseek-ai/DeepSpec

Source: https://github.com/deepseek-ai/DeepSpec

DeepSpec

DeepSpec is a full-stack codebase for training and evaluating draft models for speculative decoding. It contains data preparation utilities, draft model implementations, training code, and evaluation scripts.

Environment

Install the Python dependencies:

python -m pip install -r requirements.txt

Data preparation additionally requires an inference engine to serve the target model when regenerating answers; see scripts/data/README.md for details.

Workflow

Run the stages in order — each stage’s output feeds the next:

Data Preparation — download prompts, regenerate target answers, and build the target cache.
Training — train a draft model against the cached target outputs.
Evaluation — measure speculative-decoding acceptance on benchmark tasks.

Data Preparation

See scripts/data/README.md for the step-by-step data pipeline:

download and split training data,
regenerate answers,
prepare the target cache (storage warning: this can be very large — roughly 38 TB for the default Qwen/Qwen3-4B setting).

Training

bash scripts/train/train.sh

train.sh launches train.py, which spawns one worker per visible GPU. Select the algorithm and target model by pointing config_path at one of the configs under config/ (e.g. config/dspark/dspark_qwen3_4b.py); see the script header for the full list of configs, how to override config_path / target_cache_dir, and how to use --opts to override individual config fields. Checkpoints are written to ~/checkpoints/<project_name>/<exp_name>/step_*.

Hardware: the default configs and scripts assume a single node with 8 GPUs. For fewer GPUs, reduce CUDA_VISIBLE_DEVICES.

Evaluation

bash scripts/eval/eval.sh

eval.sh runs eval.py against a trained draft checkpoint over the speculative-decoding benchmarks in eval_datasets/ (gsm8k, math500, aime25, humaneval, mbpp, livecodebench, mt-bench, alpaca, arena-hard-v2). Set:

target_name_or_path — the target model the draft was trained against (e.g. Qwen/Qwen3-4B),
draft_name_or_path — the draft checkpoint, e.g. ~/checkpoints/deepspec/dspark_block8_qwen3_4b/step_latest.

Supported Algorithms

Currently, DeepSpec includes three draft models: DSpark, DFlash and Eagle3.

License

DeepSpec is released under the MIT License. It includes code adapted from third-party projects under their own licenses; see NOTICE for the full attribution.

Acknowledgements

DeepSpec builds on the ideas and code of several excellent open-source projects:

SpecForge (Apache-2.0) — the overall training framework and Eagle3 implementation; portions of the Eagle3 modeling, loss, optimizer, attention, and evaluation code are adapted from it. Adapted files carry an in-file attribution comment, and the full notice is recorded in NOTICE.
DFlash (MIT) — the DFlash draft-model design and training recipe.
Qwen3 and Gemma — the target model families supported in this repo.

We thank the authors and maintainers of these projects. Contributions of new algorithms are welcome.

@danielhanchen: DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%!…

deepseek-ai/DeepSpec

DeepSpec

Environment

Workflow

Data Preparation

Training

Evaluation

Supported Algorithms

License

Acknowledgements

Similar Articles

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85% (18 minute read)

@dzhulgakov: DSpark from @deepseek_ai ingeniously integrates many speculative decoding ideas to achieve 1.5x to 5x higher throughput…

DeepSeek open-sources inference optimizations with 60–85% faster generation [pdf]

deepseek-ai/DeepSeek-V4-Flash-DSpark

@DeRonin_: DeepSeek just dropped a 5-page paper + free GitHub repo that makes any LLM respond 80% faster it's called speculative d…

Submit Feedback

Similar Articles

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85% (18 minute read)

@dzhulgakov: DSpark from @deepseek_ai ingeniously integrates many speculative decoding ideas to achieve 1.5x to 5x higher throughput…

DeepSeek open-sources inference optimizations with 60–85% faster generation [pdf]

deepseek-ai/DeepSeek-V4-Flash-DSpark

@DeRonin_: DeepSeek just dropped a 5-page paper + free GitHub repo that makes any LLM respond 80% faster it's called speculative d…