parallel-tree-drafting

#parallel-tree-drafting

[Research] JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup with more than 1000TPS

Reddit r/LocalLLaMA ↗ · 3d ago

JetSpec introduces parallel tree drafting for speculative decoding, achieving up to 9.64x end-to-end speedup on LLM inference while maintaining lossless accuracy, with throughput reaching ~1000 TPS on a single B200 GPU.

0 favorites 0 likes

#parallel-tree-drafting

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

arXiv cs.CL ↗ · 2026-06-18 Cached

JetFlow is a speculative decoding framework that breaks the scaling ceiling by combining one-forward drafting efficiency with branch-wise causal conditioning, achieving up to 9.64x speedup on math benchmarks and outperforming prior methods on dense and MoE Qwen3 models.

0 favorites 0 likes

parallel-tree-drafting

[Research] JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup with more than 1000TPS

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Submit Feedback