trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Reddit r/LocalLLaMA Models

Summary

Trained a prompt injection classifier using ml-intern and DeepSeek V4 Flash, achieving 99% F1 with DistilBERT, optimized to ONNX int8 (~65MB) and deployable in the browser via Transformers.js v3.

Trained a prompt injection classifier using `ml-intern` \+ DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, \~65 MB, runs in browser with Transformers.js v3. You can try it here: [https://huggingface.co/spaces/av-codes/prompt-injection-detector](https://huggingface.co/spaces/av-codes/prompt-injection-detector) \--- I've been interested in prompt injections and agentic security for a while, and wanted to see how a purpose-built ML agent compares to general-purpose coding agents for this kind of task. Here's roughly how it went: `ml-intern` takes an HF token and supports OpenAI-compatible APIs, so I pointed it at OpenRouter (GPU-poor). The agent found existing datasets, [deepset/prompt-injections](https://huggingface.co/datasets/deepset/prompt-injections) and [Shomi28/prompt-injection-dataset](https://huggingface.co/datasets/Shomi28/prompt-injection-dataset), which simplified things since building the dataset is typically 95% of the work in tasks like this. For v1, I went with DistilBERT targeting CPU inference. After a few parameter sweeps, the agent launched a full run and landed at F1 95.87%. I also tried training an HRM-Text model, but the agent didn't figure it out and set up a TRM run instead (different architecture, no positional encoding). When I steered it back to HRM with the [correct paper](https://arxiv.org/abs/2605.20613), the training script wasn't optimized for my hardware. I spent $20 on HF remote training with a T4, but it fumbled after epoch 1 because agent didn't follow training routine from the paper and used wrong optimiser/params leading to params blowing up. For v2, I found a [larger synthetic dataset](https://huggingface.co/datasets/Bordair/bordair-multimodal) from Bordair and re-trained the DistilBERT. That's the model in the Space above. What surprised me: * DeepSeek v4 Flash via API cost under $5 total for all agent runs * the agent was more hands-off than expected on the happy path * it broke down on non-standard architectures * it naturally leans toward the HF stack, which was fine for this, but worth knowing The obvious gap: the synthetic dataset means the train/test splits might be too similar. Not a proper scientific approach, but it's the most pleasant ML experience I've had with an agentic tool so far. The HRM run is still pending. I'm curious to learn about other people's experiences with these tools. Thank you!
Original Article

Similar Articles

Understanding prompt injections: a frontier security challenge

OpenAI Blog

OpenAI publishes guidance on prompt injection attacks, a social engineering vulnerability where malicious instructions hidden in web content or documents can trick AI models into unintended actions. The company outlines its multi-layered defense strategy including instruction hierarchy research, automated red-teaming, and AI-powered monitoring systems.

I have (even faster) DeepSeek V4 Pro at home

Reddit r/LocalLLaMA

A user reports successfully running the DeepSeek V4 Pro model locally using ktransformers and sharing detailed benchmark results across various context depths, demonstrating improved inference speeds.

deepseek-ai/DeepSeek-V4-Pro

Hugging Face Models Trending

DeepSeek releases V4-Pro and V4-Flash, Mixture-of-Experts models supporting million-token context with hybrid attention and Muon optimizer.