Are model security risks (extraction, poisoning) actually being tested in production? [R]
Summary
Discussion about whether ML teams are actually testing model security risks like extraction and poisoning in production, noting that security review for models lags behind regular software.
Similar Articles
Are local LLM users testing prompt injection before connecting models to tools?
A discussion on safety practices for local LLMs when connected to tools, questioning whether prompt injection testing is common before giving models tool access.
Lessons learned on language model safety and misuse
OpenAI shares lessons learned on language model safety and misuse, discussing challenges in measuring risks, the limitations of existing benchmarks, and their development of new evaluation metrics for toxicity and policy violations. The post also highlights concerns about labor market impacts and the need for continued research on measuring social effects of AI deployment at scale.
Most AI security discussions are still focused on “protecting the model.”
This article discusses how AI systems with capabilities like reading internal docs and calling APIs require a new security approach, moving beyond traditional SaaS security to Zero Trust principles for AI agents.
Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy
This paper introduces AI-MASLD, a stress-audit framework for medical LLMs that reveals how benchmark accuracy can hide serious safety failures, and demonstrates that open-weight models can match or exceed proprietary ones on safety dimensions.
For tool-using agents, where do you draw the security boundary?
A discussion on the security risks of AI agents using tools, focusing on prompt injection as a practical threat where untrusted text can alter agent behavior, and the need for repeatable testing before granting permissions.