Tag
Running Gemma 12B model on a Google Pixel 10 Pro using llama.cpp achieves 6.5 tokens per second prompt processing and 1.3 tokens per second generation with under 10 watts power consumption, demonstrating efficient on-device AI inference.
This paper presents the first end-to-end RAG pipeline running entirely on a mobile NPU (Qualcomm Hexagon on Snapdragon X Elite), achieving up to 18x faster LLM prefilling and 4x lower energy vs. CPU, with no quality regression.
Benchmark shows local Stable Diffusion 1.5 on iPhone can generate 512x512 images in as little as 3.1 seconds using optimized models like Realistic Vision V5.1 Hyper, making on-device AI image generation practical.
This article discusses the imminent arrival of AI-powered smartphones and the implications for consumers and the tech industry.
This article argues that the real issue with integrating Gemini deeper into Android isn't just privacy, but the action boundary—what the AI can read, suggest, draft, change, send, buy, or delete—and proposes a tiered consent model for different levels of AI agency.
Google and Apple are bringing AI-powered 'vibe coding' to mobile, allowing users to create custom Android apps, widgets, and shortcuts via natural language prompts, as demonstrated at Google I/O 2026 and reported for iOS.
Google AI Edge Gallery v1.0.13 & v1.0.14 updates add support for Gemma 4 with multi-token prediction, Pixel TPU optimization, experimental MCP, new skills, and chat history saving, enhancing on-device generative AI capabilities.
MiniCPM-V 4.6 is an ultra-efficient 1.3B vision-language model optimized for mobile devices.
OpenBMB has released MiniCPM V4.6, a 1B-parameter multimodal large language model optimized for mobile devices under the Apache 2.0 license. It features mixed visual token compression and claims approximately 1.5x faster throughput than Qwen3.5 0.8B while running natively on iOS, Android, and HarmonyOS.
AI scanning tools are turning ordinary smartphones into full-featured 3D production studios, enabling browser-based interactive 3D virtual tours that once required six-figure budgets to be completed quickly with just a phone.
OpenGUI is an open-source AI phone control system that lets AI autonomously operate real Android devices to carry out long-running mobile tasks such as social media management and research. It supports remote task dispatching via Lark, Telegram, Discord, or REST API. Its underlying architecture is split into two layers — a Plan Supervisor and an Executor Graph — and supports multiple models including Claude, Qwen, and Doubao.
ClawGUI is an open-source framework for training, evaluating, and deploying GUI agents using reinforcement learning, featuring standardized benchmarks and cross-platform deployment to Android, iOS, and HarmonyOS.
Google announces Gemma 3n preview, a mobile-first open AI model optimized for on-device inference on phones, tablets, and laptops. Built on a new architecture developed with hardware partners like Qualcomm and MediaTek, Gemma 3n uses innovations like Per-Layer Embeddings to achieve fast performance with minimal memory footprint (2-3GB), while supporting multimodal capabilities.
Google has enhanced its Circle to Search feature by leveraging Gemini 3 to enable holistic scene recognition of screen content, with a particular focus on breaking down fashion ensembles into individual items and supporting virtual try-ons. This update allows users to seamlessly find alternative products and preview how they look without needing to take screenshots, thereby improving the overall visual search experience.