@rohanpaul_ai: atomic[.]chat (a desktop app that runs LLMs locally) ran a very revealing comparison for local AI agents, on a MacBook …
Summary
Liquid's LFM2.5-8B-A1B outperformed OpenAI's gpt-oss-20b on a tool-calling benchmark when run locally on a MacBook Pro, completing all required tool calls in half the time while using less memory.
View Cached Full Text
Cached at: 05/31/26, 06:40 AM
atomic[.]chat (a desktop app that runs LLMs locally) ran a very revealing comparison for local AI agents, on a MacBook Pro M5 Max, 64GB.
Liquid’s much smaller LFM2.5-8B-A1B beat gpt-oss-20b by finishing every required tool call, cutting runtime by more than half, and using 4.8GB https://t.co/89GRmfJeJk
atomic.chat (@atomic_chat_hq): Liquid’s LFM2.5-8B-A1B smashed OpenAI’s gpt-oss-20b on tool calling
We ran both locally on a MacBook Pro M5 Max, 64GB, and gave each the same trip-planning request that only completes if the model fires all 7 tool calls - weather for 3 cities, two currency conversions, an email
Similar Articles
@rohanpaul_ai: atomic[.]chat just made Gemma 4 26B faster inside LLaMA.cpp. making token generation about 40% faster in its MacBook Pr…
atomic.chat has optimized Gemma 4 26B inference in LLaMA.cpp, achieving ~40% faster token generation on MacBook Pro M5 Max using Multi-Token Prediction (MTP) speculative decoding. This is a notable win for local AI users running desktop apps, coding agents, and private on-device assistants.
@rohanpaul_ai: Another good news for local-LLM from atomic[.]chat, that runs 100% offline on your computer. They just showed MTP (Mult…
atomic.chat's MTP technique speeds up local LLM inference by drafting multiple tokens and verifying them together, achieving up to 137% speedup on Qwen 27B dense model with zero accuracy loss.
I've created the fastest local AI engine for Apple Silicon. Optimised for agentic use.
The author announces the release of 'lightning-mlx', a local AI engine optimized for Apple Silicon that achieves high token speeds for coding agents and tool-calling workflows.
Localmaxxing (3 minute read)
The article analyzes the viability of running AI inference locally on a MacBook Pro, comparing a local Qwen 35B model against the cloud-based Claude Opus 4.5. It concludes that local models are 2x faster for routine tasks, making them a practical choice for half of daily workloads despite a slight capability gap.
@DataChaz: MIND BLOWN that an open-source model running on a MacBook can go toe-to-toe with the cloud Spent yesterday in @atomic_c…
A tweet reports that an open-source model (Gemma 4 31B) running locally on a MacBook via AtomicChat can match cloud Gemini 3.5 Flash for generating a playable Mario game in HTML/Canvas, signaling a shrinking cloud moat.