llm-acceleration

Tag

Cards List
#llm-acceleration

Accelerating Gemini Nano models on Pixel with frozen Multi-Token Prediction (10 minute read)

TLDR AI · 2d ago Cached

Google Research introduces a new architecture using frozen Multi-Token Prediction to accelerate Gemini Nano models on Pixel devices, significantly improving speed and energy efficiency for on-device AI features.

0 favorites 0 likes
#llm-acceleration

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

arXiv cs.CL · 2026-06-18 Cached

JetFlow is a speculative decoding framework that breaks the scaling ceiling by combining one-forward drafting efficiency with branch-wise causal conditioning, achieving up to 9.64x speedup on math benchmarks and outperforming prior methods on dense and MoE Qwen3 models.

0 favorites 0 likes
← Back to home

Submit Feedback