Tag
This paper introduces the apothem measure for computing trustworthy robustness certifications in neural networks, proves intractability of volume-optimal certifications, and presents the ParallelepipedoNN system achieving twofold improvement in minimum edge length on MNIST and Fashion MNIST.
Abliteration launches a made-to-order synthetic training data workflow that generates negative, rare, and adversarial examples for classifiers, with schema, real-world facts, labels, provenance, and export to platforms like Hugging Face.
Researchers study how adversarial robustness transfers across different perturbation types in deep neural networks, evaluating 32 attacks of 5 types on ImageNet models. Results show that robustness to one perturbation type doesn't always transfer to others and may sometimes hurt robustness elsewhere.
OpenAI introduces Activation Atlases, a technique for visualizing and understanding the internal representations of neural networks, enabling humans to discover spurious correlations and unexpected behaviors such as fooling image classifiers by adding noodles to images.
Researchers demonstrated adversarial images that reliably fool neural network classifiers across multiple scales and perspectives, challenging assumptions about the robustness of multi-scale image capture systems used in autonomous vehicles.
This article examines adversarial attacks on machine learning models and demonstrates why gradient masking—a defensive technique that attempts to deny attackers access to useful gradients—is fundamentally ineffective. The paper shows that attackers can circumvent gradient masking by training substitute models that mimic the defended model's behavior, making the defense strategy ultimately futile.