LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Reddit r/LocalLLaMA Models

Summary

LFM2.5 230M model achieves 1,400 tokens per second in-browser using custom WebGPU kernels, demonstrating efficient local inference.

No content available
Original Article

Similar Articles

When you don't have a data center GPU

Reddit r/LocalLLaMA

LiquidAI releases LFM2.5-230M, a 230M parameter language model designed to run on limited hardware, with support for transformers, vLLM, and SGLang.