edge-ai

#edge-ai

An Introduction to YOLO26

Hacker News Top ↗ · yesterday Cached

YOLO26 is a multi-task computer vision model family released in January 2026, featuring end-to-end detection without Non-Maximum Suppression for lower latency and optimized for edge deployment with improved CPU inference and compact design.

0 favorites 0 likes

#edge-ai

@Oluwaphilemon1: Claude Fable 5 is dead and GPT-5.6 delaying launch… Microsoft has changed the game They've open-sourced bitnet.cpp, a 1…

X AI KOLs Timeline ↗ · yesterday Cached

Microsoft open-sourced bitnet.cpp, a 1-bit LLM inference framework that enables running 100B parameter models on local CPUs without GPUs, achieving 6.17x faster inference and 82.2% less energy consumption.

0 favorites 0 likes

#edge-ai

Reverse Engineering the Qualcomm NPU Compiler

Lobsters Hottest ↗ · 4d ago Cached

Reverse engineering the Qualcomm NPU compiler reveals undocumented VTCM memory management, MILP-based placement, automatic precision alteration, and a hidden analytical simulator (Hextimate) for edge deployment optimization.

0 favorites 0 likes

#edge-ai

@ciruai: Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 Strix Halo with 128GB RAM. Getting ~15 TPS over a decently long …

X AI KOLs Timeline ↗ · 6d ago Cached

Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 with 128GB RAM achieves ~15 TPS for a 284B MoE model (13B active) locally, costing $3,000 versus $25,000+ for a datacenter setup, highlighting the feasibility of running large models on consumer hardware.

0 favorites 0 likes

#edge-ai

Le Gros Chaton running on my '84 Corolla Radio

Reddit r/LocalLLaMA ↗ · 2026-06-16

A demonstration of running 'Le Gros Chaton' (likely a lightweight AI model) on a 1984 Toyota Corolla radio, showcasing edge AI on vintage hardware.

0 favorites 0 likes

#edge-ai

@cevenif: 90% of machine learning tutorials on the market are actually misleading you—what's the point of just training a model? If it can't go into production, all the earlier effort is wasted. Seriously, I've seen too many people fall into this trap: they follow tutorials and train models like crazy, but when they put them into real-world environments, they immediately break—they don't know how to deploy, can't set up monitoring, and scalability is a mess. Harvard University directly...

X AI KOLs Timeline ↗ · 2026-06-16 Cached

Harvard University open-sourced the textbook "Machine Learning Systems," which systematically covers practical topics such as ML system design, data engineering, model deployment, MLOps, and edge AI, aiming to help bring AI from research into production. It is freely available on GitHub.

0 favorites 0 likes

#edge-ai

APEX: Adaptive Principle EXtraction A Three-Layer Self-Evolution Framework for Production AI Agents

arXiv cs.AI ↗ · 2026-06-16 Cached

APEX proposes a three-layer self-evolution framework for production AI agents that simultaneously optimizes the harness, behavioural principles, and workflow topology. Experiments on a production agent show significant improvements in health score and workflow quality with minimal LLM calls.

0 favorites 0 likes

#edge-ai

A satellite just learned to find things on its own — here’s what that means

TechCrunch AI ↗ · 2026-06-15 Cached

A satellite called Yam-9 used Google DeepMind's Gemma 3 vision-language model in orbit to autonomously identify areas of interest based on natural language queries, marking the first reported use of a VLM in space and signaling a shift toward more autonomous satellite operations.

0 favorites 0 likes

#edge-ai

PrintGuard 2.0 — ShuffleNetV2 + few-shot prototypical network, TFLite via LiteRT, ≈5 MB, runs unmodified in the browser (Pyodide) and on CPython [P]

Reddit r/MachineLearning ↗ · 2026-06-15

PrintGuard 2.0 is a major rewrite of a few-shot FDM fault detector using a ShuffleNetV2 backbone and prototypical network, now with a single Python engine that runs unmodified on both CPython and Pyodide in the browser via a platform abstraction layer, enabling per-printer sensitivity tuning and fair inference scheduling.

0 favorites 0 likes

#edge-ai

D2H-AD: A Hybrid Model Utilizing Hyperdimensional Computing for Advanced Anomaly Detection

arXiv cs.LG ↗ · 2026-06-15 Cached

D2H-AD is a novel anomaly detection framework using Hyperdimensional Computing (HDC) that combines distance-based and density-aware encoding. It outperforms five baselines across multiple benchmarks, offering lightweight, interpretable, and efficient performance for edge AI and IoT.

0 favorites 0 likes

#edge-ai

🚀PP-OCRv6 is officially released !

Reddit r/LocalLLaMA ↗ · 2026-06-12

PaddleOCR releases PP-OCRv6, a new OCR model series with sizes from 1.5M to 34.5M parameters, offering improved accuracy and faster inference, supporting 50 languages and new scenarios like PCB and CAD drawings, under Apache 2.0 open source license.

0 favorites 0 likes

#edge-ai

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

arXiv cs.LG ↗ · 2026-06-10 Cached

Sigma-Branch restructures pretrained dense networks into a hierarchical binary tree with a shared backbone, routers, and specialized leaves, reducing per-inference active parameters by 58–60% while staying within 1.72 pp of baseline accuracy on CIFAR-100, ImageNet-1K, and ModelNet40.

0 favorites 0 likes

#edge-ai

@danveloper: https://x.com/danveloper/status/2064387956387758206

X AI KOLs Timeline ↗ · 2026-06-09 Cached

A developer ran DeepSeek-V4-Flash on a Raspberry Pi 5 by streaming model weights from an NVMe SSD, achieving 1.3 tokens/second at 8 watts, demonstrating the feasibility of frontier-adjacent open-weight models on low-cost, offline hardware.

0 favorites 0 likes

#edge-ai

Jetson Orin NX Build for Hermes Agent + Benchmarking

Reddit r/LocalLLaMA ↗ · 2026-06-09

A detailed build and benchmarking of a Jetson Orin NX system for running Hermes Agent, achieving 14.65 tok/s at 8k context and 10.21 tok/s at 60k context with Gemma 4 26B quantized model.

0 favorites 0 likes

#edge-ai

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

arXiv cs.LG ↗ · 2026-06-09 Cached

This paper presents a two-stage methodology for end-to-end LLM deployment on spatial NPUs, progressing from human-guided development to an autonomous agent skill system. The system achieves speedups of 2.2x on prefill and 4.0x on decode for a reference model, and autonomously deploys eight additional LLMs on AMD XDNA 2 NPU with minimal human guidance.

0 favorites 0 likes

#edge-ai

The GPUless Revolution: How Efficient AI Models Are Democratizing Artificial Intelligence

Reddit r/AI_Agents ↗ · 2026-06-08

A quiet revolution is making powerful AI models runnable on consumer hardware without expensive GPUs, thanks to breakthroughs in quantization and optimized implementations like llama.cpp's Gemma4 MTP support, democratizing access for hobbyists, small businesses, and edge computing.

0 favorites 0 likes

#edge-ai

Clustering 3x Jetson Nano Orin Supers

Reddit r/LocalLLaMA ↗ · 2026-06-07

The author announces a new blog post on clustering three Jetson Nano Orin Supers for distributed training and inference, continuing a series to help people build small compute clusters with accessible hardware.

0 favorites 0 likes

#edge-ai

Are We Underestimating Small Edge AI Models?[D]

Reddit r/MachineLearning ↗ · 2026-06-05

A developer argues that the edge AI community overlooks small, specialized models that can run locally on devices like smartphones, using a self-built offline Morse code recognition feature as an example. The project uses a sub-5 MB AI model with TensorFlow/Keras and LiteRT, and the entire pipeline from data generation to mobile integration was custom-built.

0 favorites 0 likes

#edge-ai

@KanikaBK: Google just dropped an AI bomb! A BILLION DOLLARS Game is on. Gemma 4 12 B runs on your laptop. 16 GB of RAM, that is a…

X AI KOLs Timeline ↗ · 2026-06-03 Cached

Google released Gemma 4 12B, an open-source multimodal AI model under Apache 2.0 that runs locally on laptops with 16GB RAM, targeting enterprise edge deployment.

0 favorites 0 likes

#edge-ai

@zhixianio: After receiving the new machine, I began an 'ascetic' practice of forcing myself to use local models for common tasks. I thought it would be painful, but both speed and quality greatly exceeded my expectations: Model: Qwen3.6-35B-A3B-oQ6-fp16-mtp, Running: oMLX, with N…

X AI KOLs Timeline ↗ · 2026-06-03 Cached

The author uses the Qwen3.6-35B-A3B model and oMLX tool on the new local machine for daily tasks, finding that both speed and quality far exceed expectations, even outperforming remote LLMs in PA and coding scenarios, demonstrating a significant improvement in on-device AI capabilities.

0 favorites 0 likes

edge-ai

Submit Feedback