650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.

Reddit r/LocalLLaMA Models

Summary

A collection of 650+ Apache-2.0 licensed biomedical NER and de-identification models that run on-device via MLX, achieving 30-40x faster inference than PyTorch-CPU on an M3 Max with identical outputs.

No content available
Original Article

Similar Articles