custom-kernels

#custom-kernels

LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Reddit r/LocalLLaMA ↗ · 14h ago

LFM2.5 230M model achieves 1,400 tokens per second in-browser using custom WebGPU kernels, demonstrating efficient local inference.

0 favorites 0 likes

#custom-kernels

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Reddit r/LocalLLaMA ↗ · 2026-05-20

A developer successfully runs DeepSeek-V4-Flash (284B total, 13B active) locally on four RTX 2080 Ti GPUs with a $2,500 budget, achieving 255 prefill tokens/s using custom Turing CUDA kernels, W8A8 quantization, and heterogeneous inference. The implementation is open-sourced.

0 favorites 0 likes

custom-kernels

LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s!

Submit Feedback