Tag
Trained a prompt injection classifier using ml-intern and DeepSeek V4 Flash, achieving 99% F1 with DistilBERT, optimized to ONNX int8 (~65MB) and deployable in the browser via Transformers.js v3.
The article highlights the new WebGPU backend in llama.cpp/ggml, enabling GPU-accelerated local AI model inference in browsers, developed by Reese Levine and team at USCS over the past year and a half.
A developer demonstrates running the Qwen3.6-27B AI model entirely on WebGPU in a browser, though speed is not optimal.