on-device

Tag

Cards List
#on-device

@MaziyarPanahi: A year ago, OpenMed didn't exist. Today: 340M model downloads. 1,500+ open medical models, all Apache 2.0. 650+ run on …

X AI KOLs Following · 13h ago Cached

A year after its inception, OpenMed has achieved 340 million model downloads, offering over 1,500 open medical models under Apache 2.0, with 650+ capable of running on-device on iPhones.

0 favorites 0 likes
#on-device

Getting real work out of a 4B local model: the distill-on-idle pipeline behind an on-device "memory" assistant

Reddit r/LocalLLaMA · 3d ago

Describes a 'distill-on-idle' pipeline that enables a 4B parameter local model to run effectively as an on-device memory assistant, demonstrating practical use of small models.

0 favorites 0 likes
#on-device

AnySimLite: A Lightweight Few-Shot Similarity Encoder for On-Device Speech-Adjacent Classification

arXiv cs.CL · 3d ago Cached

Introduces AnySimLite, a lightweight similarity encoder for on-device speech-adjacent classification tasks, achieving state-of-the-art or competitive performance while using less than 1/250th the model size of the qLLaMA-LoRA-7B baseline.

0 favorites 0 likes
#on-device

Liquid AI Releases Liquid Foundation Models 2.5 230M (3 minute read)

TLDR AI · 4d ago Cached

Liquid AI releases LFM2.5-230M, a lightweight foundation model that runs on devices from cloud GPUs to CPUs and Raspberry Pi, with strong performance on tool use and data extraction tasks.

0 favorites 0 likes
#on-device

@TheAhmadOsman: Continual Learning will run locally That's why the big labs aren't talking about it Not your weights, not your model, L…

X AI KOLs Following · 4d ago Cached

A tweet argues that continual learning will run locally, explaining why major AI labs avoid discussing it: because data stays on device.

0 favorites 0 likes
#on-device

@yoheinakajima: who wants to help eyal poke holes in this approach to run LLM inference... in browser?

X AI KOLs Following · 4d ago Cached

Eyal Toledano built an LLM inference engine using pure WebGPU/WGSL, running on-device in browser and Node without API keys, and is seeking peer review.

0 favorites 0 likes
#on-device

@timseyde: Dumbo's first steps — LFM2.5-230M doing multi-step tool-calling over pre-trained skills provided by @nvidia SONIC. Same…

X AI KOLs Following · 4d ago Cached

Liquid AI's LFM2.5-230M model demonstrates multi-step tool-calling capabilities on a Unitree G1 robot, running entirely on-device on an NVIDIA Jetson Orin, acting as a skill-selection layer.

0 favorites 0 likes
#on-device

@liquidai: Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic ta…

X AI KOLs Timeline · 4d ago Cached

Liquid AI releases LFM2.5-230M, a small 230M parameter model optimized for fast inference on CPUs, NPUs, and GPUs, targeting agentic tasks on devices like phones and robots.

0 favorites 0 likes
#on-device

On-Device Neural Architecture Search

arXiv cs.LG · 4d ago Cached

Proposes a lightweight neural architecture search performed directly on the deployment device for near-sensor computing, validated on sEMG sign language and fault diagnosis datasets, achieving improved accuracy and reduced RAM occupancy.

0 favorites 0 likes
#on-device

LiquidAI/LFM2.5-230M

Hugging Face Models Trending · 5d ago Cached

Liquid AI released LFM2.5-230M, a compact 230M-parameter hybrid model optimized for on-device deployment with fast edge inference speeds (213 tok/s on Galaxy S25 Ultra) and built for agentic tasks via reinforcement learning.

0 favorites 0 likes
#on-device

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

arXiv cs.LG · 5d ago Cached

A benchmark study comparing traditional machine learning methods (Random Forest, XGBoost, SVM, Logistic Regression) against lightweight transformer variants (DistilBERT, TinyBERT, MobileBERT) for on-device fault detection across three public datasets. Traditional ML offers competitive accuracy at far smaller resource footprints, while TinyBERT-4L is the most deployment-friendly transformer.

0 favorites 0 likes
#on-device

FUTO Swipe

Product Hunt · 6d ago

FUTO Swipe is a product offering open models for on-device swipe typing, enabling privacy-focused keyboard input.

0 favorites 0 likes
#on-device

@Ex0byt: Update: the road to GLM-5.2: we're getting there, folks! non-quantized, non-pruned DeepSeek-v4-Flash. 11tok/s on a sing…

X AI KOLs Timeline · 6d ago Cached

Update on running a non-quantized DeepSeek-v4-Flash model at 11 tok/s on a single DGX Spark using sglang inference and a custom mega-kernel, progressing towards GLM-5.2.

0 favorites 0 likes
#on-device

650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

Reddit r/LocalLLaMA · 6d ago

A collection of 650+ Apache-2.0 licensed biomedical NER and de-identification models that run on-device via MLX, achieving 30-40x faster inference than PyTorch-CPU on an M3 Max with identical outputs.

0 favorites 0 likes
#on-device

Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents

Hugging Face Daily Papers · 2026-06-23 Cached

This paper introduces a conversational voice agent system that uses a lightweight on-device 'Talker' model to start responding immediately, then incorporates knowledge from a frontier LLM 'Reasoner' as it becomes available, achieving 7-19x faster time-to-first-response while approaching frontier-level performance on a laptop.

0 favorites 0 likes
#on-device

@denziideng: No internet, no cloud, no upload — you can have a private AI Agent right on your iPhone, and the results are amazing... Many AI assistants now upload data to the cloud, causing privacy and response speed issues... There's an open-source project that runs the AI Agent locally on the iPhone, no internet required, a thoughtful personal...

X AI KOLs Timeline · 2026-06-22 Cached

PhoneClaw is an open-source project that runs the AI Agent entirely locally on the iPhone, based on models like Gemma 4 and MiniCPM-V, no internet or data upload needed. It supports on-device operations such as voice, calendar, health data, ensuring privacy and fast response.

0 favorites 0 likes
#on-device

@0x0SojalSec: OpenJarvis : Builds personal AI agents that run locally, - http://github.com/open-jarvis/OpenJarvis…

X AI KOLs Timeline · 2026-06-17 Cached

OpenJarvis is an open-source framework for building personal AI agents that run locally on devices, with support for local LLMs and a focus on energy efficiency and privacy.

0 favorites 0 likes
#on-device

Reason to run local agents instead #645

Reddit r/LocalLLaMA · 2026-06-15

Explains reasons to run local AI agents instead of cloud-based alternatives, highlighting privacy and control benefits.

0 favorites 0 likes
#on-device

Efficient On-Device Diffusion LLM Inference with Mobile NPU

arXiv cs.LG · 2026-06-15 Cached

This paper presents llada.cpp, an NPU-aware inference framework for accelerating diffusion large language models (dLLMs) on smartphones. It introduces three techniques—Multi-Block Speculative Decoding, Dual-Path Progressive Revision, and Swap-Optimized Memory Runtime—to align dLLM inference with mobile NPU characteristics, achieving 17-42x latency reduction over CPU baseline.

0 favorites 0 likes
#on-device

Gemma 12b less than 10 watts 6.5pp 1.3tg

Reddit r/LocalLLaMA · 2026-06-14

Running Gemma 12B model on a Google Pixel 10 Pro using llama.cpp achieves 6.5 tokens per second prompt processing and 1.3 tokens per second generation with under 10 watts power consumption, demonstrating efficient on-device AI inference.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback