Transfer of adversarial robustness between perturbation types

OpenAI Blog Papers

Summary

Researchers study how adversarial robustness transfers across different perturbation types in deep neural networks, evaluating 32 attacks of 5 types on ImageNet models. Results show that robustness to one perturbation type doesn't always transfer to others and may sometimes hurt robustness elsewhere.

No content available
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:55 PM

# Transfer of adversarial robustness between perturbation types Source: [https://openai.com/index/transfer-of-adversarial-robustness-between-perturbation-types/](https://openai.com/index/transfer-of-adversarial-robustness-between-perturbation-types/) OpenAI## Abstract We study the transfer of adversarial robustness of deep neural networks between different perturbation types\. While most work on adversarial examples has focused onL∞L\_∞andL2L\_2\-bounded perturbations, these do not capture all types of perturbations available to an adversary\. The present work evaluates 32 attacks of 5 different types against models adversarially trained on a 100\-class subset of ImageNet\. Our empirical results suggest that evaluating on a wide range of perturbation sizes is necessary to understand whether adversarial robustness transfers between perturbation types\. We further demonstrate that robustness against one perturbation type may not always imply and may sometimes hurt robustness against other perturbation types\. In light of these results, we recommend evaluation of adversarial defenses take place on a diverse range of perturbation types and sizes\.

Similar Articles

Testing robustness against unforeseen adversaries

OpenAI Blog

OpenAI researchers developed a method to evaluate neural network robustness against unforeseen adversarial attacks, introducing a new metric called UAR (Unforeseen Attack Robustness) that assesses model performance against unanticipated distortion types beyond the commonly studied Lp norms.

Robust adversarial inputs

OpenAI Blog

Researchers demonstrated adversarial images that reliably fool neural network classifiers across multiple scales and perspectives, challenging assumptions about the robustness of multi-scale image capture systems used in autonomous vehicles.

Adversarial attacks on neural network policies

OpenAI Blog

OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.

Trading inference-time compute for adversarial robustness

OpenAI Blog

OpenAI presents evidence that reasoning models like o1 become more robust to adversarial attacks when given more inference-time compute to think longer. The research demonstrates that increased computation reduces attack success rates across multiple task types including mathematics, factuality, and adversarial images, though significant exceptions remain.