Kuma: compiling PyTorch models into self-contained WebGPU executables [P]
Summary
Kuma is a compiler/runtime that compiles exported PyTorch models into self-contained WebGPU executables, enabling direct browser inference without Python or server dependencies.
Similar Articles
A hackable compiler to generate efficient fused GPU kernels for AI models [P]
The author presents a custom, hackable ML compiler written in Python that lowers LLMs to optimized CUDA kernels through a multi-stage IR pipeline, achieving performance competitive with or superior to PyTorch on specific operations. The article details the compiler's optimization passes, lowering rules, and CLI usage for generating efficient fused GPU kernels.
@PyTorch: One runtime, multiple GPU architectures, and zero vendor-specific model code. In this blog post, the TokenSpeed team @l…
TokenSpeed-Kernel is a portable, high-performance kernel system for LLM inference that enables zero vendor-specific model code and supports multiple GPU architectures, achieving up to 3.6x higher throughput on AMD MI355X.
LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels
LFM2.5 230M model achieves 1,400 tokens per second in-browser using custom WebGPU kernels, demonstrating efficient local inference.
I built a compiler that rewrites Python into a model-facing representation
Vulpine is a compiler that transforms human-readable Python code into a compressed macro representation optimized for LLMs, reducing token count by 13.8% on average while enabling exact structural reconstruction.
@hank_aibtc: Amazing! Running Gemma 4 in the browser, on par with ChatGPT?! Completely zero server, zero data upload, offline, pure WebGPU local inference! Xenova has open-sourced all 27 custom WebGPU kernels written by Fable 5: - Gemma 4 E2B (2.3B parameters...)
The article introduces Xenova's open-sourcing of 27 custom WebGPU kernels, enabling Gemma 4 to run fully offline and locally in the browser at 255 tok/s, and discusses advantages like privacy and offline use. It also mentions FLUX.2's 3D generation capability.