causal-tree

Tag

Cards List
#causal-tree

[Research] JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup with more than 1000TPS

Reddit r/LocalLLaMA · 3d ago

JetSpec introduces parallel tree drafting for speculative decoding, achieving up to 9.64x end-to-end speedup on LLM inference while maintaining lossless accuracy, with throughput reaching ~1000 TPS on a single B200 GPU.

0 favorites 0 likes
← Back to home

Submit Feedback