650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

Reddit r/LocalLLaMA 06/23/26, 06:09 PM Models

biomedical-ner de-identification apache-2.0 on-device mlx performance open-source

Summary

A collection of 650+ Apache-2.0 licensed biomedical NER and de-identification models that run on-device via MLX, achieving 30-40x faster inference than PyTorch-CPU on an M3 Max with identical outputs.

No content available

Original Article

Similar Articles

New local model reaching near frontier on PII removal at 9 ms CPU inference

Reddit r/LocalLLaMA

Introduces ScreenLeak, a benchmark for measuring PII redaction in computer-use AI data, and presents two local models (v45_phase3 for text and rfdetr_v8 for images) achieving near-frontier performance at low latency.

I fine-tuned Parakeet 0.6B for medical ASR — open weights, local Mac/CUDA/CPU

Reddit r/LocalLLaMA

Omi Health founder fine-tuned NVIDIA's Parakeet TDT 0.6B for medical ASR, releasing open-weights model Omi Med STT v1 that achieves competitive medical-WER while running locally on Mac, CUDA, or CPU.

@Modular: .@hippocraticai runs 400B+ parameter models for real-time patient conversations, tens of thousands per day. When they b…

X AI KOLs Following

Hippocratic AI partners with Modular to use MAX framework for inference on large language models, achieving sub-500ms TTFT, ~30% faster P99 latency and ~22% faster mean latency at scale on NVIDIA B300 GPUs, with portability to AMD.

@AlexJonesax: Two open-source MLX inference servers worth knowing about if you run LLMs on Mac: MTPLX (@youssofal) Uses a model's own…

X AI KOLs Timeline

This article highlights two open-source MLX inference servers for Mac: MTPLX, which optimizes token speed using speculative decoding without a draft model, and oMLX, which improves workflow efficiency with persistent KV caches for coding agents.

@neural_avb: I am working on porting SAM models and harness into Apple silicon. Already seeing 1.25x inference speed increase on mlx…