What impedes apps using AI to make the user’s device the server running a local LLM?

Reddit r/singularity 04/22/26, 08:25 PM News

Summary

A user reflects on why more apps don’t run local LLMs directly on phones, noting Gemma 2-4B models already work offline and could eliminate server costs while maintaining near-GPT-4o quality.

I was using Gemma’s models in the plane offline while reading a book, to aid my studying, and I thought about this. Think about it this way. Most phones can run Gemma’s 2B model (many the 4B too), and open source models will get cheaper and better, and CPUs more optimized for AI. Gemma 4B is almost on par with GPT 4o which was the #1 model at its time. I think in the future, it will work this way: Client: Request -> Local LLM using phone as server -> Server: Response -> Phone view (client) No compute costs. I don’t know why there isn’t anybody doing anything with models like that. With agency and other connections/RAG, it can be really powerful, and plenty of stuff could be done. It doesn’t even have to be LLMs, it can be classic ML for tasks. Sorry if it’s a dumb question, I’m not technical and I’m just interested in the subject matter.

Original Article

Similar Articles

We built an app that runs AI completely offline on your phone (Local LLMs). Perfect for flights, camping, or dead zones.

Reddit r/artificial

Introduces Cortex AI, an app that runs AI completely offline on phones using optimized local models, addressing privacy and connectivity issues.

@rohanpaul_ai: Gemma 4 (specifically its edge-optimized E2B and E4B variants) running fully offline on an iPhone via apps like Locally…

X AI KOLs Following

Google’s Gemma 4 E2B/E4B quantized variants now run fully offline on iPhone via apps like Locally AI, leveraging the Apple Neural Engine for on-device inference.

Local AI needs to be the norm

Hacker News Top

The article argues against relying on cloud-hosted AI APIs due to privacy and reliability concerns, advocating for on-device AI processing as demonstrated by a native iOS app using Apple's local model APIs.

Why can't people just run gemini and claude code using their own gpus?

Reddit r/artificial

A commentary questioning why users cannot run Gemini and Claude Code locally on their own GPUs, implying compute cost constraints are limiting access to these AI models.

@rohanpaul_ai: So much possibilities for on-device small models. Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro. ~4…