on-device

#on-device

Efficient On-Device Diffusion LLM Inference with Mobile NPU

arXiv cs.LG ↗ · 2026-06-15 Cached

This paper presents llada.cpp, an NPU-aware inference framework for accelerating diffusion large language models (dLLMs) on smartphones. It introduces three techniques—Multi-Block Speculative Decoding, Dual-Path Progressive Revision, and Swap-Optimized Memory Runtime—to align dLLM inference with mobile NPU characteristics, achieving 17-42x latency reduction over CPU baseline.

0 favorites 0 likes

#on-device

Gemma 12b less than 10 watts 6.5pp 1.3tg

Reddit r/LocalLLaMA ↗ · 2026-06-14

Running Gemma 12B model on a Google Pixel 10 Pro using llama.cpp achieves 6.5 tokens per second prompt processing and 1.3 tokens per second generation with under 10 watts power consumption, demonstrating efficient on-device AI inference.

0 favorites 0 likes

#on-device

Show HN: Trace – Offline Mac meeting transcripts you can flag mid-call

Hacker News Top ↗ · 2026-06-13 Cached

Trace is a Mac app that transcribes meetings locally without uploading audio, allowing users to flag moments mid-call and get clean markdown transcripts.

0 favorites 0 likes

#on-device

@paulabartabajo_: Advice for AI engineers The best way to learn local AI is to build with local AI. 7 hands-on webinars from the last 7 m…

X AI KOLs Timeline ↗ · 2026-06-12 Cached

A collection of 7 hands-on, open-source webinars from the past 7 months focused on building with local AI and small language models, all running on-device.

0 favorites 0 likes

#on-device

Revi

Product Hunt ↗ · 2026-06-12

Revi is a voice dictation app that runs on-device without needing cloud services or an account.

0 favorites 0 likes

#on-device

Local-First Software Is Easier to Scale

Lobsters Hottest ↗ · 2026-06-11 Cached

This article argues that local-first software, like the Harper grammar checker, avoids scaling issues by running code on-device, making it easier to handle traffic spikes without additional server costs.

0 favorites 0 likes

#on-device

@atomic_chat_hq: Atomic Chat is now on Hugging Face We're officially a Local App on the world's biggest AI hub. Run 200,000+ open-weight…

X AI KOLs Timeline ↗ · 2026-06-11 Cached

Atomic Chat is now available as a Local App on Hugging Face, allowing users to run 200,000+ open-weight models privately and locally on their devices.

0 favorites 0 likes

#on-device

Synopsule

Product Hunt ↗ · 2026-06-11

Synopsule is a product that provides on-device, private AI meeting transcripts, ensuring data stays local.

0 favorites 0 likes

#on-device

VTT for Mac

Product Hunt ↗ · 2026-06-11

VTT for Mac is a voice-to-text tool for macOS that offers a fully on-device option for privacy.

0 favorites 0 likes

#on-device

Tried to benchmark Google’s new on-device dictation models (Eloquent) and basically couldn’t

Reddit r/LocalLLaMA ↗ · 2026-06-10

A user attempted to benchmark Google's new on-device dictation app Eloquent, which uses proprietary models, and found it frequently drops words or returns incomplete transcripts, with accuracy competitive only when complete. The author theorizes the underlying chat-style model sometimes refuses transcribing.

0 favorites 0 likes

#on-device

@akshay_pachaar: Apple finally did it. Its new framework, Core AI, runs models entirely on Apple silicon, so inference happens on the us…

X AI KOLs Following ↗ · 2026-06-09 Cached

Apple released Core AI, a new framework that runs AI models entirely on Apple silicon devices (iPhone, iPad, Mac, Vision Pro) with zero server calls. It includes a memory-safe Swift API, model export recipes for PyTorch, an optimizer, and debugging tools, supporting models like Qwen, Mistral, and SAM3.

0 favorites 0 likes

#on-device

ColibotAI

Product Hunt ↗ · 2026-06-09

ColibotAI is an on-device AI tool that translates, summarizes, and explains any text without needing internet connection.

0 favorites 0 likes

#on-device

Apple announced new on device inference engine for Apple Silicon

Reddit r/LocalLLaMA ↗ · 2026-06-09

Apple announced CoreAI, a new on-device inference engine for Apple Silicon at WWDC, replacing CoreML and supporting larger models up to 20B parameters via optimized inference, with a focus on phones and tablets.

0 favorites 0 likes

#on-device

@awnihannun: It's very cool that Apple shipped a 20B parameter on-device. You can't put 20B parameters in RAM at any reasonable prec…

X AI KOLs Following ↗ · 2026-06-09 Cached

Apple shipped a 20B parameter on-device model using a MoE variant that selects experts once per query to fit in NAND, enabling inference despite RAM constraints.

0 favorites 0 likes

#on-device

Apple Core AI Framework

Hacker News Top ↗ · 2026-06-08

Apple introduces Core AI Framework, a new tool for on-device machine learning.

0 favorites 0 likes

#on-device

Siri AI

Hacker News Top ↗ · 2026-06-08 Cached

Apple announces the next generation of Apple Intelligence and Siri, featuring on-device AI processing, privacy-focused enhancements like Private Cloud Compute, and new capabilities such as Genmoji and smarter home integration.

0 favorites 0 likes

#on-device

Tested how long small models hold a fact across a conversation. The memory failure mode is a real problem for agents, and it's not what I expected.

Reddit r/AI_Agents ↗ · 2026-06-08

A developer tested how small edge models (LFM2.5, Gemma variants) retain a single fact across conversation turns, finding that models often confidently deny knowing information that remains in context, posing a trust issue for agent architectures and suggesting a trade-off between memory and format discipline.

0 favorites 0 likes

#on-device

@MaziyarPanahi: 6,000,000+ PyPI downloads in under a year. OpenMed 1.5.5 ships today: batch PII, on-device redaction in 9 languages, pl…

X AI KOLs Timeline ↗ · 2026-06-08 Cached

OpenMed 1.5.5 ships with batch PII redaction on-device in 9 languages, open-source under Apache 2.0, having surpassed 6 million PyPI downloads in under a year.

0 favorites 0 likes

#on-device

Signal Recorder SR-7

Product Hunt ↗ · 2026-06-07

Signal Recorder SR-7 is an on-device voice recorder that transcribes audio and exports Markdown files.

0 favorites 0 likes

#on-device

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Ars Technica ↗ · 2026-06-03 Cached

Google releases Gemma 4 12B, a compact AI model optimized for local laptop use with only 16GB of RAM, featuring multi-token prediction and streamlined multimodal capabilities for text, audio, and images.

0 favorites 0 likes

on-device

Submit Feedback