I mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)

Reddit r/LocalLLaMA 07/01/26, 02:22 PM Tools

local-llm hardware-compatibility open-dataset memory-requirements llm-quantization github ollama

Summary

An open dataset on GitHub maps which local LLMs fit various RAM tiers (8GB to 128GB), providing memory sizing rules, per-tier model lists, and Ollama commands, with a JSON API for programmatic access.

I kept answering the same question for friends ("I've got a 16GB MacBook / a 3060, what can I actually run?") and got tired of guessing, so I started a spreadsheet. It grew into a real dataset, so I put it on GitHub under CC BY for anyone to use or fix. Rule of thumb I landed on: at Q4_K_M a model needs roughly 0.6GB of memory per billion params, and you want to size to about 70% of your RAM/VRAM so the OS, context and KV cache still have room. From that, the comfortable ceiling per tier (62 local models in the set right now): RAM usable budget max params that fit models that fit 8GB ~5.6GB ~8B 23 16GB ~11GB ~14B 36 24GB ~17GB ~27B 41 32GB ~22GB ~35B 50 48GB ~34GB ~47B 53 64GB ~45GB ~70B 56 128GB ~90GB ~122B 58 The full thing (specific models per tier, quant, load size, the ollama command for each, plus GPU / Mac / iPhone breakdowns) is here: https://github.com/Wecko-ai/modelfit-hardware-dataset . There's a JSON API too if you'd rather pull it programmatically. Honest caveats: the tok/s figures are bandwidth-derived estimates, not benchmarks I ran on every chip. Ballpark only. coverage is strongest on Apple Silicon and consumer NVIDIA. AMD is newer and thinner. "fits" means it loads and runs at a usable speed, not "fits at full context" (long context eats a lot more). If something looks off (a model that should fit and doesn't, a quant I got wrong, a card I'm missing), tell me or open a PR. That's the whole point of it being open. (full disclosure: I also built a site and CLI on top of this, modelfit.io, but the dataset itself is the useful part and it's free to use)

Original Article

I mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)

Similar Articles

@oliviscusAI: Someone just built a tool that tells you exactly which LLMs will run on your hardware. it scans your ram, cpu, and gpu,…

Built a tool that tells you exactly which LLMs fit on your GPU. Feedback wanted.

GPU Memory Math for LLMs (2026 Edition)

@akshay_pachaar: Google just dropped a new LLM! You can run it locally on just 8GB RAM. Let's fine-tune this on our own data (100% local…

Any good uses for a 192 GB DDR3 Server in the LLM world?

Submit Feedback

Similar Articles

@oliviscusAI: Someone just built a tool that tells you exactly which LLMs will run on your hardware. it scans your ram, cpu, and gpu,…

Built a tool that tells you exactly which LLMs fit on your GPU. Feedback wanted.

GPU Memory Math for LLMs (2026 Edition)

@akshay_pachaar: Google just dropped a new LLM! You can run it locally on just 8GB RAM. Let's fine-tune this on our own data (100% local…

Any good uses for a 192 GB DDR3 Server in the LLM world?
A discussion about potential uses for an older DDR3 server with 192GB of RAM in the context of large language model workloads.