llm-acceleration

#llm-acceleration

Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction (10 minute read)

TLDR AI ↗ · 2d ago Cached

Google Research introduces a new architecture using frozen Multi-Token Prediction to accelerate Gemini Nano models on Pixel devices, significantly improving speed and energy efficiency for on-device AI features.

0 favorites 0 likes

#llm-acceleration

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

arXiv cs.CL ↗ · 2026-06-18 Cached

JetFlow is a speculative decoding framework that breaks the scaling ceiling by combining one-forward drafting efficiency with branch-wise causal conditioning, achieving up to 9.64x speedup on math benchmarks and outperforming prior methods on dense and MoE Qwen3 models.

0 favorites 0 likes

llm-acceleration

Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction (10 minute read)

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Submit Feedback