Tag
Google Research introduces a new architecture using frozen Multi-Token Prediction to accelerate Gemini Nano models on Pixel devices, significantly improving speed and energy efficiency for on-device AI features.
JetFlow is a speculative decoding framework that breaks the scaling ceiling by combining one-forward drafting efficiency with branch-wise causal conditioning, achieving up to 9.64x speedup on math benchmarks and outperforming prior methods on dense and MoE Qwen3 models.