@sakurayukiai: My favorite detail about 'free' local inference is the depreciation math. If you amortize a $4k Mac over 5 years, runni…
Summary
A tweet notes that when amortizing a $4k Mac over 5 years, running a 31B model costs $1.50 per million tokens, making local inference a luxury compared to cheaper API options.
View Cached Full Text
Cached at: 05/18/26, 12:31 PM
My favorite detail about ‘free’ local inference is the depreciation math. If you amortize a $4k Mac over 5 years, running a 31B model costs $1.50 per million tokens. The API is 3x cheaper. Local compute is officially a luxury good and I respect it ✨
Similar Articles
Apple Silicon costs more than OpenRouter
A comparison of the cost per million tokens for running LLMs locally on Apple Silicon hardware versus using cloud inference via OpenRouter, finding that local inference is typically 3x more expensive and slower.
The most expensive part of running AI agents isn't the tokens. It's the time figuring out why they did something.
Building AI agents reveals that the major cost is debugging—spending weeks chasing issues like upstream API changes—not just token or model inference costs.
Localmaxxing (3 minute read)
The article analyzes the viability of running AI inference locally on a MacBook Pro, comparing a local Qwen 35B model against the cloud-based Claude Opus 4.5. It concludes that local models are 2x faster for routine tasks, making them a practical choice for half of daily workloads despite a slight capability gap.
@LottoLabs: The skills you learn from running local models is more valuable than the cost of the hardware
This tweet argues that the skills gained from running local AI models are worth more than the hardware cost.
Stop pretending self-hosting is cheaper. It's not. We do it for different reasons and we should say so.
A user breaks down the actual costs of self-hosting AI inference hardware vs renting cloud compute, concluding self-hosting is not cheaper per token but is worth it for privacy, control, and tinkering.