What impedes apps using AI to make the user’s device the server running a local LLM?
Summary
A user reflects on why more apps don’t run local LLMs directly on phones, noting Gemma 2-4B models already work offline and could eliminate server costs while maintaining near-GPT-4o quality.
Similar Articles
We built an app that runs AI completely offline on your phone (Local LLMs). Perfect for flights, camping, or dead zones.
Introduces Cortex AI, an app that runs AI completely offline on phones using optimized local models, addressing privacy and connectivity issues.
@rohanpaul_ai: Gemma 4 (specifically its edge-optimized E2B and E4B variants) running fully offline on an iPhone via apps like Locally…
Google’s Gemma 4 E2B/E4B quantized variants now run fully offline on iPhone via apps like Locally AI, leveraging the Apple Neural Engine for on-device inference.
Local AI needs to be the norm
The article argues against relying on cloud-hosted AI APIs due to privacy and reliability concerns, advocating for on-device AI processing as demonstrated by a native iOS app using Apple's local model APIs.
Why can't people just run gemini and claude code using their own gpus?
A commentary questioning why users cannot run Gemini and Claude Code locally on their own GPUs, implying compute cost constraints are limiting access to these AI models.
@rohanpaul_ai: So much possibilities for on-device small models. Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro. ~4…
Google's Gemma 4 E2B is demonstrated running on an iPhone 17 Pro via MLX optimization, achieving ~40 tokens/second with 128K context and offline thinking mode for coding and math.