Adversarial attacks on neural network policies
Summary
OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.
View Cached Full Text
Cached at: 04/20/26, 02:46 PM
Similar Articles
Testing robustness against unforeseen adversaries
OpenAI researchers developed a method to evaluate neural network robustness against unforeseen adversarial attacks, introducing a new metric called UAR (Unforeseen Attack Robustness) that assesses model performance against unanticipated distortion types beyond the commonly studied Lp norms.
Robust adversarial inputs
Researchers demonstrated adversarial images that reliably fool neural network classifiers across multiple scales and perspectives, challenging assumptions about the robustness of multi-scale image capture systems used in autonomous vehicles.
Attacking machine learning with adversarial examples
This article examines adversarial attacks on machine learning models and demonstrates why gradient masking—a defensive technique that attempts to deny attackers access to useful gradients—is fundamentally ineffective. The paper shows that attackers can circumvent gradient masking by training substitute models that mimic the defended model's behavior, making the defense strategy ultimately futile.
Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation
This paper presents a comprehensive benchmark for evaluating adversarial attacks and defenses in Graph Neural Networks, highlighting the need for standardized and fair experimental protocols.
OpenAI Red Teaming Network
OpenAI launches a Red Teaming Network to crowdsource adversarial testing of AI models from diverse experts and perspectives. The program accepts rolling applications, offers flexible time commitments (as little as 5 hours/year), compensation, and emphasizes safety expertise and underrepresented backgrounds.