LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Reddit r/LocalLLaMA 06/25/26, 06:35 PM Models

webgpu in-browser ai-model optimization large-language-model custom-kernels high-throughput

Summary

LFM2.5 230M model achieves 1,400 tokens per second in-browser using custom WebGPU kernels, demonstrating efficient local inference.

No content available

Original Article

Similar Articles

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Reddit r/LocalLLaMA

Gemma 4 is demonstrated running in-browser via WebGPU at 255 tokens per second, using kernels generated by Fable 5, showcasing efficient on-device inference.

@LottoLabs: A very cool model for the GPU poor bros Trained on an ungodly amount of tokens for a 8b a1b model Gonna be super fast e…

X AI KOLs Timeline

LottoLabs announces LiquidAI's LFM2.5-8B-A1B-GGUF model, an 8B parameter model trained on a massive token count and optimized for fast inference on limited GPU hardware, with support for llama.cpp, Ollama, vLLM, and more.

@liquidai: Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic ta…

X AI KOLs Timeline

Liquid AI releases LFM2.5-230M, a small 230M parameter model optimized for fast inference on CPUs, NPUs, and GPUs, targeting agentic tasks on devices like phones and robots.

When you don't have a data center GPU

Reddit r/LocalLLaMA

LiquidAI releases LFM2.5-230M, a 230M parameter language model designed to run on limited hardware, with support for transformers, vLLM, and SGLang.

@0xSero: GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45…

X AI KOLs Timeline

A quantized 478B-parameter GLM-5.1 model runs on 4×RTX Pro 6000 GPUs via SGLang, delivering 370k-token context at up to 45 tok/s decode and 1340 tok/s prefill, and is demoed driving Figma.

Similar Articles

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

@LottoLabs: A very cool model for the GPU poor bros Trained on an ungodly amount of tokens for a 8b a1b model Gonna be super fast e…

@liquidai: Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic ta…

When you don't have a data center GPU

@0xSero: GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45…

Submit Feedback