Tag
Google launched a free AI voice dictation app, powered by Gemma 4, supporting iOS and Mac, fully local, no subscription needed.
Microsoft announced two new on-device AI models at Build 2026: Aion 1.0 Instruct, an open-weights small language model, and Aion 1.0 Plan, a 14B parameter reasoning and tool-calling model for local agentic workflows.
An open-source project that uses a phone microphone for live breath detection and biofeedback, processing audio on-device to enhance self-awareness without wearables or cloud uploads.
A solo developer is building Scout, an AI companion that runs entirely on-device without cloud or account, and is seeking feedback before beta release.
NVIDIA announced RTX Spark PCs and a wave of updates to enable local AI agents across RTX and DGX ecosystems, including the OpenShell runtime coming to Windows, NemoClaw expansion, performance improvements, and integrations with Adobe and H Company.
MiniCPM5-1B is a 1B parameter model from OpenBMB that achieves impressive scores on AIME 2025 and τ2-Bench Telecom, outperforming larger models. It features both fast and reasoning modes from a single checkpoint, enabled by a three-stage post-training process including supervised fine-tuning, reinforcement learning, and on-policy distillation.
Google DeepMind released Magenta RealTime 2, an open music generation model for on-device streaming with low-latency control via text, audio examples, and MIDI.
UI-KOBE proposes a framework that enhances lightweight mobile GUI agents by constructing and leveraging app-specific knowledge graphs to improve task planning and execution efficiency.
LoRDBA replaces LoRA's floating-point low-rank factors with binary sign carriers and channel-wise scales, enabling efficient on-device fine-tuning with significant footprint reduction and minimal latency overhead, matching fp16 quality.
MobileMoE introduces efficient on-device mixture-of-experts language models with sub-billion parameters, achieving better performance and efficiency than dense baselines and existing MoE models. The models are trained on open-source datasets and demonstrate significant speedups on commodity smartphones.
OpenBMB releases MiniCPM5-1B, a dense 1B Transformer model achieving SOTA among open-source 1B-class models, designed for on-device deployment with hybrid reasoning and long-context support.
BitCPM is a new open-source model from ModelBest, Tsinghua, and OpenBMB that uses ternary weights (-1,0,1) to run full-sized AI models on phones.
Gemma 4 is a 4-bit quantized model optimized for Apple Silicon, enabling fast local inference on Mac devices, reducing reliance on cloud computing.
Supertonic 3 is a 99M parameter open-source TTS model that runs entirely on-device, beating ElevenLabs on a Raspberry Pi with 167x faster than real-time performance on a laptop CPU.
Google Gemma demonstrates Gemma 4 E4B autonomously navigating and driving an iOS simulator using Argent, showcasing on-device automation capabilities.
Launching PhoneDiffusion, a local AI image generator for iPhone with sub-5 second generations, privacy, and no account needed.
Hy-MT2 is a new open-source multilingual translation model from Tencent Hy that supports 33 languages, offers flexible instruction capabilities, and achieves 2-bit quantization under 500MB for on-device deployment.
Tencent Hunyuan released Hy-MT2, a family of translation models up to 30B parameters with MoE, supporting 33 languages and quantized for on-device use.
Microsoft released Fara-7B, a 7-billion parameter small language model that can autonomously control a computer to perform tasks like clicking, scrolling, and filling forms, running on-device and beating larger models like OpenAI's computer-use agent on benchmarks.
Google's Gemma 4 E2B is demonstrated running on an iPhone 17 Pro via MLX optimization, achieving ~40 tokens/second with 128K context and offline thinking mode for coding and math.