trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Reddit r/LocalLLaMA 05/22/26, 04:41 PM Models

prompt-injection classifier ml-intern deepseek distilbert onnx browser-inference

Summary

Trained a prompt injection classifier using ml-intern and DeepSeek V4 Flash, achieving 99% F1 with DistilBERT, optimized to ONNX int8 (~65MB) and deployable in the browser via Transformers.js v3.

Trained a prompt injection classifier using `ml-intern` \+ DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, \~65 MB, runs in browser with Transformers.js v3. You can try it here: [https://huggingface.co/spaces/av-codes/prompt-injection-detector](https://huggingface.co/spaces/av-codes/prompt-injection-detector) \--- I've been interested in prompt injections and agentic security for a while, and wanted to see how a purpose-built ML agent compares to general-purpose coding agents for this kind of task. Here's roughly how it went: `ml-intern` takes an HF token and supports OpenAI-compatible APIs, so I pointed it at OpenRouter (GPU-poor). The agent found existing datasets, [deepset/prompt-injections](https://huggingface.co/datasets/deepset/prompt-injections) and [Shomi28/prompt-injection-dataset](https://huggingface.co/datasets/Shomi28/prompt-injection-dataset), which simplified things since building the dataset is typically 95% of the work in tasks like this. For v1, I went with DistilBERT targeting CPU inference. After a few parameter sweeps, the agent launched a full run and landed at F1 95.87%. I also tried training an HRM-Text model, but the agent didn't figure it out and set up a TRM run instead (different architecture, no positional encoding). When I steered it back to HRM with the [correct paper](https://arxiv.org/abs/2605.20613), the training script wasn't optimized for my hardware. I spent $20 on HF remote training with a T4, but it fumbled after epoch 1 because agent didn't follow training routine from the paper and used wrong optimiser/params leading to params blowing up. For v2, I found a [larger synthetic dataset](https://huggingface.co/datasets/Bordair/bordair-multimodal) from Bordair and re-trained the DistilBERT. That's the model in the Space above. What surprised me: * DeepSeek v4 Flash via API cost under $5 total for all agent runs * the agent was more hands-off than expected on the happy path * it broke down on non-standard architectures * it naturally leans toward the HF stack, which was fine for this, but worth knowing The obvious gap: the synthetic dataset means the train/test splits might be too similar. Not a proper scientific approach, but it's the most pleasant ML experience I've had with an agentic tool so far. The HRM run is still pending. I'm curious to learn about other people's experiences with these tools. Thank you!

Original Article

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Similar Articles

deepseek-ai/DeepSeek-V4-Flash-DSpark

Semalith v1.4: A Calibrated 184M Safety Classifier Achieving State-of-the-Art Prompt-Injection Detection at 44x Fewer Parameters than Llama-Guard-3-8B

Understanding prompt injections: a frontier security challenge

I have (even faster) DeepSeek V4 Pro at home

deepseek-ai/DeepSeek-V4-Pro

Submit Feedback

Similar Articles

deepseek-ai/DeepSeek-V4-Flash-DSpark

Semalith v1.4: A Calibrated 184M Safety Classifier Achieving State-of-the-Art Prompt-Injection Detection at 44x Fewer Parameters than Llama-Guard-3-8B

Understanding prompt injections: a frontier security challenge

I have (even faster) DeepSeek V4 Pro at home