Localmaxxing (3 minute read)

TLDR AI News

Summary

The article analyzes the viability of running AI inference locally on a MacBook Pro, comparing a local Qwen 35B model against the cloud-based Claude Opus 4.5. It concludes that local models are 2x faster for routine tasks, making them a practical choice for half of daily workloads despite a slight capability gap.

Local models can do many tasks that leading cloud models can do at much less cost.
Original Article
View Cached Full Text

Cached at: 05/13/26, 12:22 AM

# Localmaxxing Source: [https://tomtunguz.com/localmaxxing](https://tomtunguz.com/localmaxxing) As demand for AI inference explodes, I’ll be asking a lot more of my little computer\. How much more? Over the past five weeks, I’ve been using local models to see how much of my daily work I can accomplish without the trillion parameter models in the cloud\. The answer is half\. CategoryCount% of TotalExampleOther52135\.3%Catch\-all for unstructured requestsScheduling25417\.2%Check availability, propose meeting timesMarket Research19213\.0%Competitor analysis, fundraising dataSummarization18412\.4%Transcript review, video summariesEmail & Inbound17011\.5%Draft replies, follow\-ups, forwardsEngineering1479\.9%Debug scripts, API fixes, CLI tasksAdmin100\.7%Travel, expenses, reimbursementsIf you classify these 1\.4k tasks by category, half can succeed on a local 35B model\. Email & Inbound, Scheduling, Summarization, & Admin total 618 tasks \(41\.8%\)\. Market Research & Engineering split roughly 50/50 between simple tasks \(data lookups, script fixes\) and complex ones \(multi\-source synthesis, architectural decisions\)\. That gets us to 50%\. There are many reasons to use local models : privacy, cost, asset depreciation\.[1](https://tomtunguz.com/localmaxxing#fn:1) But in reality, the only one that really matters is latency\. I ran a head\-to\-head benchmark this morning\. Eight agentic tasks, same prompts, both models warmed\. Qwen 3\.6 35B\-A3B\-4bit on my MacBook Pro M5 vs Claude Opus 4\.5 via API\. [![Qwen 35B local vs Opus 4.5 cloud : mean 2.8s vs 5.8s, 2.1x speedup](https://res.cloudinary.com/dzawgnnlr/image/upload/w_1512,h_1134,c_fill,g_auto,q_auto,f_auto/hilh2r0rbei20utdqyhj)](https://res.cloudinary.com/dzawgnnlr/image/upload/q_auto,f_auto/hilh2r0rbei20utdqyhj) The local model isn’t smarter\. Opus 4\.5 scores ~20% higher on reasoning benchmarks\.[Local models lag frontier by 3\-4 months](https://tomtunguz.com/qwen-local-models/), and for large\-scale complex tasks, that gap matters\. But for routine agent tasks, it rarely does\. Opus wins on structure & polish : bullet points, headers, cleaner code\. Qwen wins on brevity, often half the tokens\. I read every output side by side, and both completed the tasks correctly\. For agent tasks where output feeds into another system, terseness is a feature\. Localmaxxing, pushing more inference to local models, is an inevitable response to[tokenmaxxing](https://tomtunguz.com/tokenmaxxing/)\. As local models improve & close the gap with frontier, more users will shift workloads to their own hardware\. If half the work runs 2x faster on my laptop, I’ll take that trade every time\. My little computer is about to earn its keep\.

Similar Articles

Running local models on an M4 with 24GB memory

Hacker News Top

A guide on running local AI models like Qwen 3.5-9B on an M4 MacBook with 24GB RAM using tools like LM Studio, Ollama, and pi, including specific configuration tips for optimal performance.