on-device-inference

#on-device-inference

Ran gemma 4 12b on my 3090 yesterday and I think the local model game just changed

Reddit r/artificial ↗ · 3d ago

A user reports running Google's Gemma 4 12B model locally on a single RTX 3090 via GGUF quantization, finding strong performance including real 256k context, multimodal capabilities, and function calling that outperforms larger 70B models for coding tasks.

0 favorites 0 likes

#on-device-inference

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper systematically studies hybrid multi-agent systems combining cloud-based LLMs and on-device SLMs, revealing task-dependent optimal architectures and challenging the assumption that more frontier compute always improves performance.

0 favorites 0 likes

#on-device-inference

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

arXiv cs.AI ↗ · 2026-05-27 Cached

MobileExplorer is a new framework that accelerates on-device inference for mobile GUI agents by performing lightweight parallel exploration of UI elements during model inference, reducing reasoning steps and latency by 23% while maintaining or improving task success rates.

0 favorites 0 likes

#on-device-inference

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

arXiv cs.LG ↗ · 2026-05-12 Cached

This article introduces ExecuTorch, a unified PyTorch-native deployment framework designed to run AI models on diverse edge devices without requiring model conversion or reimplementation.

0 favorites 0 likes

#on-device-inference

Local AI needs to be the norm

Hacker News Top ↗ · 2026-05-10 Cached

The article argues against relying on cloud-hosted AI APIs due to privacy and reliability concerns, advocating for on-device AI processing as demonstrated by a native iOS app using Apple's local model APIs.

0 favorites 0 likes

#on-device-inference

Do you think edge AI ends up mattering more for autonomy, robotics, or local private inference?

Reddit r/artificial ↗ · 2026-05-08

A discussion post exploring where edge AI will have the greatest impact: autonomy and robotics, low-power vision systems, private local LLMs, or bandwidth-constrained industrial deployments.

0 favorites 0 likes

on-device-inference

Ran gemma 4 12b on my 3090 yesterday and I think the local model game just changed

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

Local AI needs to be the norm

Do you think edge AI ends up mattering more for autonomy, robotics, or local private inference?

Submit Feedback