tree-drafting

#tree-drafting

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Hugging Face Daily Papers ↗ · 4d ago Cached

JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates, achieving up to 9.64x speedup on MATH-500 and 4.58x on conversational workloads.

0 favorites 0 likes

#tree-drafting

SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

arXiv cs.CL ↗ · 2026-05-11 Cached

This paper introduces SpecBlock, a block-iterative speculative decoding method that combines path dependence with efficient drafting to accelerate LLM inference. It demonstrates improved speedup over existing methods like EAGLE-3 while maintaining lower drafting costs.

0 favorites 0 likes

tree-drafting

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

Submit Feedback