What impedes apps using AI to make the user’s device the server running a local LLM?

Reddit r/singularity News

Summary

A user reflects on why more apps don’t run local LLMs directly on phones, noting Gemma 2-4B models already work offline and could eliminate server costs while maintaining near-GPT-4o quality.

I was using Gemma’s models in the plane offline while reading a book, to aid my studying, and I thought about this. Think about it this way. Most phones can run Gemma’s 2B model (many the 4B too), and open source models will get cheaper and better, and CPUs more optimized for AI. Gemma 4B is almost on par with GPT 4o which was the #1 model at its time. I think in the future, it will work this way: Client: Request -> Local LLM using phone as server -> Server: Response -> Phone view (client) No compute costs. I don’t know why there isn’t anybody doing anything with models like that. With agency and other connections/RAG, it can be really powerful, and plenty of stuff could be done. It doesn’t even have to be LLMs, it can be classic ML for tasks. Sorry if it’s a dumb question, I’m not technical and I’m just interested in the subject matter.
Original Article

Similar Articles

Local AI needs to be the norm

Hacker News Top

The article argues against relying on cloud-hosted AI APIs due to privacy and reliability concerns, advocating for on-device AI processing as demonstrated by a native iOS app using Apple's local model APIs.