sequential-monte-carlo

#sequential-monte-carlo

Faster LLM Inference via Sequential Monte Carlo

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper proposes Sequential Monte Carlo Speculative Decoding (SMC-SD), a method that accelerates LLM inference by replacing token-level rejection in speculative decoding with importance-weighted resampling over draft particles, achieving 2.36× speedup over standard speculative decoding and 5.2× over autoregressive decoding while maintaining 3% accuracy loss.

0 favorites 0 likes

sequential-monte-carlo

Faster LLM Inference via Sequential Monte Carlo

Submit Feedback