Tried to make a drop-in version of DeepMind's CaMeL — honest progress and what's still broken
Summary
The author built a lightweight, drop-in security gate that implements DeepMind's CaMeL principle of preventing untrusted data from authoring actions, achieving ~70% auto-inference accuracy on a benchmark and zero silent unsafe misclassifications, but notes gaps in provenance tracking and robustness.
Similar Articles
Introducing CodeMender: an AI agent for code security
Google DeepMind introduces CodeMender, an AI agent that automatically detects and fixes code security vulnerabilities using advanced reasoning and validation techniques. The system has already upstreamed 72 security fixes to open source projects over six months.
The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements
This paper audits LangChain, AutoGPT, and OpenAI Agents SDK for architectural safety guarantees and finds no native compliance with containment principles, demonstrating that memory poisoning can cause persistent failures; it introduces lightweight mechanisms to eliminate such attacks.
⚠️ Meta's AI safety filters were stripped in less than 10 minutes
A joint test by the Financial Times and AI safety group Alice reveals that safety filters on Meta's Llama 3.3 and Google's Gemma 4 models can be removed in under 10 minutes using a free tool called Heretic, highlighting the difficulty of regulating open-source AI safety.
AI guardrails stripped from Meta and Google models in minutes
Researchers rapidly removed safety protections from widely deployed AI models, eliciting dangerous outputs and raising concerns about robustness and release practices.
Advancing Gemini's security safeguards
Google DeepMind announces advanced security improvements for Gemini to defend against indirect prompt injection attacks through model hardening, adaptive evaluation, and layered defense mechanisms. The approach combines fine-tuning on adversarial scenarios with system-level guardrails to build inherent resilience while maintaining model performance.