Transfer of adversarial robustness between perturbation types
Summary
Researchers study how adversarial robustness transfers across different perturbation types in deep neural networks, evaluating 32 attacks of 5 types on ImageNet models. Results show that robustness to one perturbation type doesn't always transfer to others and may sometimes hurt robustness elsewhere.
View Cached Full Text
Cached at: 04/20/26, 02:55 PM
Similar Articles
Testing robustness against unforeseen adversaries
OpenAI researchers developed a method to evaluate neural network robustness against unforeseen adversarial attacks, introducing a new metric called UAR (Unforeseen Attack Robustness) that assesses model performance against unanticipated distortion types beyond the commonly studied Lp norms.
Robust adversarial inputs
Researchers demonstrated adversarial images that reliably fool neural network classifiers across multiple scales and perspectives, challenging assumptions about the robustness of multi-scale image capture systems used in autonomous vehicles.
Adversarial attacks on neural network policies
OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.
Trading inference-time compute for adversarial robustness
OpenAI presents evidence that reasoning models like o1 become more robust to adversarial attacks when given more inference-time compute to think longer. The research demonstrates that increased computation reduces attack success rates across multiple task types including mathematics, factuality, and adversarial images, though significant exceptions remain.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability
Systematic study shows LLM-based dense retrievers outperform BERT baselines on typos and poisoning but remain vulnerable to semantic perturbations, with embedding geometry predicting robustness.