on-device-ai

#on-device-ai

Chrome’s AI features may be hogging 4GB of your computer storage

Lobsters Hottest ↗ · 7h ago Cached

Google Chrome is automatically downloading a 4GB Gemini Nano model weights file to users' devices to power on-device AI features like scam detection and writing assistance, often without clear notification about storage requirements. Users can disable the On-Device AI toggle in Chrome settings to remove the file and prevent re-downloads.

0 favorites 0 likes

#on-device-ai

@garrytan: Downloading now... 1M token context window with supposedly usable coding agent capability all on a 128GB Macbook Pro is

X AI KOLs Following ↗ · 9h ago Cached

Garry Tan highlights a model with a 1M token context window and coding agent capabilities running locally on a 128GB MacBook Pro, expressing excitement about the milestone.

0 favorites 1 likes

#on-device-ai

@rohanpaul_ai: atomic[.]chat just made Gemma 4 26B faster inside LLaMA.cpp. making token generation about 40% faster in its MacBook Pr…

X AI KOLs Following ↗ · yesterday

atomic.chat has optimized Gemma 4 26B inference in LLaMA.cpp, achieving ~40% faster token generation on MacBook Pro M5 Max using Multi-Token Prediction (MTP) speculative decoding. This is a notable win for local AI users running desktop apps, coding agents, and private on-device assistants.

0 favorites 0 likes

#on-device-ai

Enabling privacy-preserving AI training on everyday devices

MIT News — Artificial Intelligence ↗ · 2026-04-29 Cached

MIT researchers developed a new framework called FTTE that accelerates privacy-preserving federated learning by 81%, enabling efficient AI training on resource-constrained edge devices like smartwatches and sensors.

0 favorites 0 likes

#on-device-ai

AngelSlim/Hy-MT1.5-1.8B-1.25bit

Hugging Face Models Trending ↗ · 2026-04-28 Cached

Tencent's AngelSlim team released Hy-MT1.5-1.8B-1.25bit, a highly compressed 1.25-bit machine translation model supporting 33 languages that fits in 440MB for on-device use. It utilizes the Sherry quantization algorithm to achieve world-class translation quality comparable to much larger models.

1 favorites 1 likes

#on-device-ai

google/gemma-4-31B-it-assistant

Hugging Face Models Trending ↗ · 2026-04-23 Cached

Google DeepMind releases Gemma 4, a family of open-weights multimodal models featuring Multi-Token Prediction (MTP) for up to 2x decoding speedups, supporting text, image, video, and audio with enhanced reasoning and coding capabilities.

0 favorites 0 likes

#on-device-ai

What impedes apps using AI to make the user’s device the server running a local LLM?

Reddit r/singularity ↗ · 2026-04-22

A user reflects on why more apps don’t run local LLMs directly on phones, noting Gemma 2-4B models already work offline and could eliminate server costs while maintaining near-GPT-4o quality.

0 favorites 0 likes

#on-device-ai

Apple's play for AI is a hardware bet, not software

Reddit r/artificial ↗ · 2026-04-21

Apple is betting that AI’s future hinges on custom hardware and on-device inference via the iPhone’s advanced processors rather than cloud-based LLMs.

0 favorites 0 likes

#on-device-ai

@sanbuphy: K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times…

X AI KOLs Timeline ↗ · 2026-04-21 Cached

K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times, boosting throughput from ~15 tokens/s to ~193 tokens/s, ultimately achieving 20% faster inference than LM Studio.

0 favorites 0 likes

#on-device-ai

Micro Language Models Enable Instant Responses

Hugging Face Daily Papers ↗ · 2026-04-21 Cached

Researchers introduce 8M-30M parameter micro language models that instantly generate the first few words on-device before cloud models complete responses, enabling responsive AI on ultra-constrained devices like smartwatches.

0 favorites 0 likes

#on-device-ai

@sudoingX: this is a laptop running a 31b parameter model at 99% gpu autonomously through hermes agent, 15 tok/s sustained, 22.8 o…

X AI KOLs Timeline ↗ · 2026-04-20 Cached

A 31B parameter model runs locally on a laptop via Hermes agent at 15 tok/s, using 22.8 GB VRAM and 94 W power, highlighting fully autonomous, private AI inference without cloud dependencies.

0 favorites 0 likes

#on-device-ai

Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar?

Reddit r/LocalLLaMA ↗ · 2026-04-19

A user shares their experience running Qwen3-35B-A3B quantized model on an M2 MacBook Pro with 32GB RAM for coding tasks via opencode and llama.cpp, finding that the 32K context window limit causes critical memory loss during compaction, making complex coding tasks impractical. They conclude that meaningful agentic coding with this model likely requires at least 128K context, exceeding what their hardware can support.

0 favorites 0 likes

#on-device-ai

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Hugging Face Blog ↗ · 2026-04-09 Cached

Overworld releases Waypoint-1.5, a real-time video world model designed for everyday GPUs, featuring improved visual fidelity and new 360p and 720p tiers for broader hardware accessibility.

0 favorites 0 likes

#on-device-ai

Gemma 4: Byte for byte, the most capable open models

Google DeepMind Blog ↗ · 2026-04-02 Cached

Google DeepMind introduces Gemma 4, its most capable family of open models to date, designed for advanced reasoning and agentic workflows with high intelligence-per-parameter efficiency across multiple sizes.

0 favorites 0 likes

#on-device-ai

Welcome Gemma 4: Frontier multimodal intelligence on device

Hugging Face Blog ↗ · 2026-04-02 Cached

Google DeepMind releases Gemma 4, a frontier multimodal model family available on Hugging Face with Apache 2 licensing, optimized for on-device deployment and supported by various inference libraries.

0 favorites 0 likes

#on-device-ai

unsloth/gemma-4-26B-A4B-it-GGUF

Hugging Face Models Trending ↗ · 2026-04-01 Cached

Unsloth releases GGUF-quantized versions of Google DeepMind's Gemma 4 26B A4B instruction-tuned model, enabling efficient local inference with support for tool-calling and fine-tuning via Unsloth Studio. Gemma 4 is a multimodal MoE model with a 256K context window, supporting text, image, video, and audio inputs.

0 favorites 0 likes

#on-device-ai

Introducing Gemma 3n: The developer guide

Google DeepMind Blog ↗ · 2025-10-25 Cached

Google DeepMind announces the full release of Gemma 3n, a mobile-first multimodal AI model optimized for on-device efficiency with MatFormer architecture. The release includes E2B and E4B variants designed for low memory usage while delivering strong performance in reasoning, coding, and multilingual tasks.

0 favorites 0 likes

#on-device-ai

Gemini Robotics On-Device brings AI to local robotic devices

Google DeepMind Blog ↗ · 2025-06-24 Cached

Google DeepMind introduces Gemini Robotics On-Device, an efficient VLA model optimized to run locally on robotic devices, enabling low-latency operation and offline capability while maintaining strong dexterous manipulation and task generalization. The model can be fine-tuned with as few as 50-100 demonstrations and comes with an SDK for developers.

0 favorites 0 likes

on-device-ai

Submit Feedback