LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels
Summary
LFM2.5 230M model achieves 1,400 tokens per second in-browser using custom WebGPU kernels, demonstrating efficient local inference.
Similar Articles
Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5
Gemma 4 is demonstrated running in-browser via WebGPU at 255 tokens per second, using kernels generated by Fable 5, showcasing efficient on-device inference.
@LottoLabs: A very cool model for the GPU poor bros Trained on an ungodly amount of tokens for a 8b a1b model Gonna be super fast e…
LottoLabs announces LiquidAI's LFM2.5-8B-A1B-GGUF model, an 8B parameter model trained on a massive token count and optimized for fast inference on limited GPU hardware, with support for llama.cpp, Ollama, vLLM, and more.
@liquidai: Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic ta…
Liquid AI releases LFM2.5-230M, a small 230M parameter model optimized for fast inference on CPUs, NPUs, and GPUs, targeting agentic tasks on devices like phones and robots.
When you don't have a data center GPU
LiquidAI releases LFM2.5-230M, a 230M parameter language model designed to run on limited hardware, with support for transformers, vLLM, and SGLang.
@0xSero: GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45…
A quantized 478B-parameter GLM-5.1 model runs on 4×RTX Pro 6000 GPUs via SGLang, delivering 370k-token context at up to 45 tok/s decode and 1340 tok/s prefill, and is demoed driving Figma.