@googlegemma: “Agentic kernel optimization is the future of on-device inference” @xenovacom used Fable 5 to write kernels that pushed…

X AI KOLs Timeline Tools

Summary

Xenova used Fable 5 to write optimized kernels achieving 255 tokens per second for Gemma 4 on WebGPU with M4, demonstrating agentic kernel optimization for on-device inference.

“Agentic kernel optimization is the future of on-device inference” @xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!! https://t.co/xPuh5OLGEt
Original Article
View Cached Full Text

Cached at: 07/02/26, 02:24 PM

“Agentic kernel optimization is the future of on-device inference”

@xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!! https://t.co/xPuh5OLGEt

Similar Articles

@hank_aibtc: Amazing! Running Gemma 4 in the browser, on par with ChatGPT?! Completely zero server, zero data upload, offline, pure WebGPU local inference! Xenova has open-sourced all 27 custom WebGPU kernels written by Fable 5: - Gemma 4 E2B (2.3B parameters...)

X AI KOLs Timeline

The article introduces Xenova's open-sourcing of 27 custom WebGPU kernels, enabling Gemma 4 to run fully offline and locally in the browser at 255 tok/s, and discusses advantages like privacy and offline use. It also mentions FLUX.2's 3D generation capability.