I made an offline, single-file GPU build picker that estimates what local models a rig will run — and at what tok/s
Summary
A developer created an offline, single-file GPU build picker that estimates which local AI models a system can run and at what token generation speed.
Similar Articles
@andrewchen: finding the main downside with experimenting with local AI models is that you end up buying one GPU, then another, then…
Andrew Chen shares his experience of buying multiple GPUs for local AI experimentation, running Qwen3.6 27B dense at 100 tok/s on a 5090 eGPU, and compares it to Sonnet 4.6.
Built a tool that tells you exactly which LLMs fit on your GPU. Feedback wanted.
A tool that estimates which LLMs fit on a user's GPU memory, ranking models by performance while considering memory constraints and quantization levels.
I Built a tool to stop manually swapping models on my 8GB GPU,chains a small Prompter and a large Coder into one pipeline with automatic VRAM swap
The author built Prompt-Chain, a Streamlit app that chains a small prompter model and a large coder model with automatic VRAM swapping, enabling efficient code generation on an 8GB GPU.
@oliviscusAI: Someone just built a tool that tells you exactly which LLMs will run on your hardware. it scans your ram, cpu, and gpu,…
A new tool has been released that scans a user's hardware specifications (RAM, CPU, GPU) to determine which Large Language Models can run locally, ranking them by performance metrics.
LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels
LFM2.5 230M model achieves 1,400 tokens per second in-browser using custom WebGPU kernels, demonstrating efficient local inference.