edge-ai

#edge-ai

@_lewtun: You can now have an AI researcher running on your laptop 24/7 for free! Running Qwen3-35B-A3B with llama.cpp and a 4-bi…

X AI KOLs Timeline ↗ · 2026-05-13 Cached

The article highlights the ability to run Qwen3-35B-A3B locally on a laptop for free using llama.cpp and Unsloth 4-bit quantization.

0 favorites 0 likes

#edge-ai

I got a real transformer language model running locally on a stock Game Boy Color!

Reddit r/LocalLLaMA ↗ · 2026-05-12

A developer successfully runs a quantized TinyStories transformer model locally on a stock Game Boy Color using custom ROM and fixed-point math.

0 favorites 0 likes

#edge-ai

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

arXiv cs.LG ↗ · 2026-05-12 Cached

This article introduces ExecuTorch, a unified PyTorch-native deployment framework designed to run AI models on diverse edge devices without requiring model conversion or reimplementation.

0 favorites 0 likes

#edge-ai

Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI

arXiv cs.LG ↗ · 2026-05-12 Cached

This study reveals a 'Smart Pruning Paradox' where activation-aware pruning methods like Wanda preserve perplexity but significantly amplify bias in Large Language Models deployed on edge devices.

0 favorites 0 likes

#edge-ai

@berryxia: Apple has been betting on on-device models all along! Unified architecture memory is the natural habitat for on-device models! Unified memory means memory is VRAM. We are seeing more and more excellent on-device models emerge. OpenBMB released MiniCPM-V 4.6, a 1.3B multimodal model. After reading it…

X AI KOLs Timeline ↗ · 2026-05-12

OpenBMB released MiniCPM-V 4.6, a 1.3B parameter multimodal model. Using high-resolution visual processing and efficient compression, it achieves fast inference on consumer hardware and mobile phones, outperforming larger models. It is fully open-source and supports multiple inference and quantization frameworks.

0 favorites 0 likes

#edge-ai

Gemma 4 running fully offline on WebGPU with Transformers.js, controlling Reachy Mini over WebSerial.

Reddit r/LocalLLaMA ↗ · 2026-05-11

Demonstrates running Gemma 4 offline in the browser using WebGPU and Transformers.js to control a Reachy Mini robot via WebSerial.

0 favorites 0 likes

#edge-ai

2.5-D Decomposition for LLM-Based Spatial Construction

arXiv cs.AI ↗ · 2026-05-11 Cached

This paper introduces a neuro-symbolic pipeline using 2.5-D decomposition to improve LLM-based spatial construction accuracy by offloading vertical coordinate calculation to a deterministic executor, achieving high accuracy on benchmarks and edge hardware.

0 favorites 0 likes

#edge-ai

@DivyanshT91162: Local LLMs just hit a whole new level This Hugging Face release is actually insane: "gpt-oss-20b-tq3" An official 20B+ …

X AI KOLs Timeline ↗ · 2026-05-10

A new 20B+ parameter MoE model from OpenAI, quantized to 3-bit via TurboQuant and optimized with MLX, allows for high-performance local LLM inference on standard 16GB MacBooks.

0 favorites 0 likes

#edge-ai

@JulianGoldieSEO: Google just made local AI 3x faster for FREE. Gemma 4 now runs fast enough on normal laptops that local AI finally feel…

X AI KOLs Timeline ↗ · 2026-05-08

Google released Gemma 4, an open-source AI model optimized for local execution on standard laptops, offering 3x faster performance and a 256k context window for free under an Apache 2.0 license.

0 favorites 0 likes

#edge-ai

Do you think edge AI ends up mattering more for autonomy, robotics, or local private inference?

Reddit r/artificial ↗ · 2026-05-08

A discussion post exploring where edge AI will have the greatest impact: autonomy and robotics, low-power vision systems, private local LLMs, or bandwidth-constrained industrial deployments.

0 favorites 0 likes

#edge-ai

Are local models becoming “good enough” faster than expected?

Reddit r/LocalLLaMA ↗ · 2026-05-07

The article discusses the growing viability of local AI models for everyday tasks, suggesting a shift toward hybrid architectures that optimize for cost and latency rather than relying solely on frontier cloud models.

0 favorites 0 likes

#edge-ai

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Hugging Face Daily Papers ↗ · 2026-04-30 Cached

MiniCPM-o 4.5 is a 9B parameter multimodal model featuring Omni-Flow, a framework enabling real-time full-duplex interaction where the model can simultaneously perceive and respond proactively. It achieves state-of-the-art open-source performance comparable to Gemini 2.5 Flash and runs on edge devices with less than 12GB RAM.

0 favorites 0 likes

#edge-ai

AngelSlim/Hy-MT1.5-1.8B-1.25bit

Hugging Face Models Trending ↗ · 2026-04-28 Cached

Tencent's AngelSlim team released Hy-MT1.5-1.8B-1.25bit, a highly compressed 1.25-bit machine translation model supporting 33 languages that fits in 440MB for on-device use. It utilizes the Sherry quantization algorithm to achieve world-class translation quality comparable to much larger models.

1 favorites 1 likes

#edge-ai

Anker made its own chip to bring AI to all its products

Hacker News Top ↗ · 2026-04-22 Cached

Anker unveiled its custom Thus AI chip using compute-in-memory architecture to enable local AI on tiny devices, starting with upcoming Soundcore flagship earbuds for superior call noise cancellation.

0 favorites 0 likes

#edge-ai

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog ↗ · 2026-04-22 Cached

NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.

0 favorites 0 likes

#edge-ai

Soul Player C64 – A real transformer running on a 1 MHz Commodore 64

Hacker News Top ↗ · 2026-04-20 Cached

Soul Player C64 implements a real 2-layer decoder-only transformer with ~25,000 int8 parameters in hand-written 6502/6510 assembly, running entirely on an unmodified 1 MHz Commodore 64 loaded from a floppy disk. The project includes training scripts to build and quantize custom models, assemble C64 binaries, and run inference at roughly 60 seconds per token.

0 favorites 0 likes

#edge-ai

The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

Hugging Face Daily Papers ↗ · 2026-04-18 Cached

Empirical study shows small language models achieve 100% adversarial robustness with System 1 intuition but collapse under System 2 reasoning when used as edge-native governance firewalls in decentralized autonomous organizations.

0 favorites 0 likes

#edge-ai

Cactus-Compute/needle

Hugging Face Models Trending ↗ · 2026-03-16 Cached

Cactus-Compute releases Needle, a 26M parameter distilled model from Gemini 3.1, using a pure attention architecture optimized for on-device inference and local fine-tuning.

0 favorites 0 likes

#edge-ai

Frigate with Hailo for object detection on a Raspberry Pi

Jeff Geerling ↗ · 2026-02-18 Cached

This blog post details how to set up Frigate with a Hailo AI coprocessor on a Raspberry Pi for object detection, including steps to fix a PCIe descriptor page size error. The setup works with the cheaper Hailo-8L and achieves low inference times.

0 favorites 0 likes

#edge-ai

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

Google DeepMind Blog ↗ · 2025-10-23 Cached

Google introduces Gemma 3 270M, a compact 270-million parameter model designed for efficient on-device AI with strong instruction-following capabilities and extreme energy efficiency (0.75% battery for 25 conversations on Pixel 9 Pro).

0 favorites 0 likes

edge-ai

Submit Feedback