Tag
A user shares a hands-on comparison of running Gemma 4 with LiteRT-LM on mobile devices versus their previous llama.cpp setup, noting significantly better memory usage (1.5-2 GB vs 4-5 GB) and faster inference (2-4 seconds vs 7-10 seconds) on smartphones like Samsung S25 Ultra and iPhone 13 Pro Max.
Cactus-Compute released Needle, a 26M parameter open-source model distilled from Gemini for efficient on-device function calling using a novel Simple Attention Network architecture without MLPs.