edge-computing

#edge-computing

CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration

arXiv cs.LG ↗ · 16h ago Cached

This paper introduces CATS, a cascaded adaptive tree speculation framework designed to accelerate LLM inference on memory-constrained edge devices by optimizing memory usage while maintaining high token acceptance rates.

0 favorites 0 likes

#edge-computing

QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization

arXiv cs.LG ↗ · 16h ago Cached

This paper introduces QuIDE, a framework featuring an Intelligence Index to evaluate the trade-offs between compression, accuracy, and latency in quantized neural networks. It demonstrates that optimal bit-widths vary by task, with 4-bit being ideal for LLMs and simple tasks, while 8-bit is better for complex CNNs.

0 favorites 0 likes

#edge-computing

EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints

arXiv cs.CL ↗ · yesterday Cached

This paper introduces EdgeFlowerTune, a benchmark for evaluating federated LLM fine-tuning under realistic edge system constraints, demonstrating that accuracy-only metrics can be misleading regarding deployability.

0 favorites 0 likes

#edge-computing

MiniCPM-V 4.6

Product Hunt ↗ · yesterday

MiniCPM-V 4.6 is an ultra-efficient 1.3B vision-language model optimized for mobile devices.

0 favorites 0 likes

#edge-computing

Localmaxxing (3 minute read)

TLDR AI ↗ · yesterday Cached

The article analyzes the viability of running AI inference locally on a MacBook Pro, comparing a local Qwen 35B model against the cloud-based Claude Opus 4.5. It concludes that local models are 2x faster for routine tasks, making them a practical choice for half of daily workloads despite a slight capability gap.

0 favorites 0 likes

#edge-computing

@AdinaYakup: MiniCPM V4.6 a 1B MLLM that actually runs on your phone, just released by @OpenBMB 1B - Apache2.0 Runs on iOS, Android,…

X AI KOLs Following ↗ · 2d ago Cached

OpenBMB has released MiniCPM V4.6, a 1B-parameter multimodal large language model optimized for mobile devices under the Apache 2.0 license. It features mixed visual token compression and claims approximately 1.5x faster throughput than Qwen3.5 0.8B while running natively on iOS, Android, and HarmonyOS.

0 favorites 0 likes

#edge-computing

You can put a data center at your house—but who really pays?

Reddit r/ArtificialInteligence ↗ · 2d ago

Nvidia is backing Span's initiative to deploy residential mini-data centers that leverage underutilized home electricity to run distributed AI workloads. The concept aims to bypass grid constraints by placing GPU nodes beside houses, though it remains largely unproven in real-world deployments.

0 favorites 0 likes

#edge-computing

Local AI needs to be the norm

Hacker News Top ↗ · 3d ago Cached

The article argues against relying on cloud-hosted AI APIs due to privacy and reliability concerns, advocating for on-device AI processing as demonstrated by a native iOS app using Apple's local model APIs.

0 favorites 0 likes

#edge-computing

How a $15 RISC-V Device Built Its Own Lightning Wallet — and Learned to Pay the Internet

Reddit r/ArtificialInteligence ↗ · 4d ago Cached

A tutorial on building an autonomous AI agent on a $15 RISC-V device (LicheeRV Nano) that can manage its own Lightning Network wallet and make autonomous Bitcoin payments via Nostr.

0 favorites 0 likes

#edge-computing

Knee Osteoarthritis Severity Grading Using Optimized Deep Learning and LLM-Driven Intelligent AI on Computationally Limited Systems

arXiv cs.AI ↗ · 5d ago Cached

This paper presents an automated diagnostic system for grading knee osteoarthritis severity using an optimized ResNet-18 model deployed on edge devices via TensorFlow Lite. It integrates an LLM interface using Gemini 2.0 Flash to provide structured interpretive findings while maintaining offline capability for resource-constrained environments.

0 favorites 0 likes

#edge-computing

Enabling privacy-preserving AI training on everyday devices

MIT News — Artificial Intelligence ↗ · 2026-04-29 Cached

MIT researchers developed a new framework called FTTE that accelerates privacy-preserving federated learning by 81%, enabling efficient AI training on resource-constrained edge devices like smartwatches and sensors.

0 favorites 0 likes

#edge-computing

Micro Language Models Enable Instant Responses

Hugging Face Daily Papers ↗ · 2026-04-21 Cached

Researchers introduce 8M-30M parameter micro language models that instantly generate the first few words on-device before cloud models complete responses, enabling responsive AI on ultra-constrained devices like smartwatches.

0 favorites 0 likes

#edge-computing

EdgeDetect: Importance-Aware Gradient Compression with Homomorphic Aggregation for Federated Intrusion Detection

Hugging Face Daily Papers ↗ · 2026-04-16 Cached

EdgeDetect is a federated intrusion detection system for 6G-IoT environments that combines importance-aware gradient binarization (32× compression) with Paillier homomorphic encryption to achieve 98% accuracy on CIC-IDS2017 while reducing communication overhead by 96.9% and enabling deployment on resource-constrained devices like Raspberry Pi 4.

0 favorites 0 likes

#edge-computing

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI

OpenAI Blog ↗ · 2026-04-13 Cached

Cloudflare and OpenAI have partnered to make OpenAI's frontier models, including GPT-5.4, directly accessible within Cloudflare Agent Cloud, enabling enterprises to deploy AI agents for real-world tasks at scale. The integration also includes Codex tools now generally available in Cloudflare Sandboxes and upcoming availability in Workers AI.

0 favorites 0 likes

#edge-computing

From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI

NVIDIA Blog ↗ · 2026-04-02 Cached

NVIDIA and Google collaborate to optimize Gemma 4 models for local deployment across RTX GPUs, DGX Spark, and Jetson devices, enabling efficient on-device agentic AI with support for reasoning, coding, multimodal capabilities, and 35+ languages.

0 favorites 0 likes

#edge-computing

Introducing Gemma 3n: The developer guide

Google DeepMind Blog ↗ · 2025-10-25 Cached

Google DeepMind announces the full release of Gemma 3n, a mobile-first multimodal AI model optimized for on-device efficiency with MatFormer architecture. The release includes E2B and E4B variants designed for low memory usage while delivering strong performance in reasoning, coding, and multilingual tasks.

0 favorites 0 likes

#edge-computing

supertone-inc/supertonic

GitHub Trending (daily) ↗ · 8h ago Cached

Supertonic is an open-source, on-device text-to-speech system designed for local inference with minimal overhead, now releasing version 3 with support for 31 languages and improved accuracy.

0 favorites 0 likes

#edge-computing

@ZDNET: What AT&T's approach is to AI and leveraging AI to stay competitive, from the stage at MWC.

X AI KOLs Timeline ↗ · 2026-04-21 Cached

AT&T outlined its AI strategy at MWC, focusing on 5G edge monetization and AI-driven 6G R&D to create new revenue streams and optimize network energy use.

0 favorites 0 likes

#edge-computing

ruvnet/RuView

GitHub Trending (daily) ↗ · 2026-04-20 Cached

RuView is an open-source WiFi sensing platform that uses Channel State Information (CSI) from low-cost ESP32 sensors to detect people, track movement, measure vital signs, and estimate pose through walls without cameras or wearables. The system runs entirely on edge hardware with cryptographic attestation and uses spiking neural networks for local adaptation.

0 favorites 0 likes

edge-computing

Submit Feedback