Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Summary
This paper demonstrates that deep neural networks are catastrophically vulnerable to minimal sign-bit flips in parameters, introducing DNL and 1P-DNL methods to identify critical vulnerable parameters without data or optimization. The vulnerability spans multiple domains including image classification, object detection, instance segmentation, and language models, with practical implications for model security.
View Cached Full Text
Cached at: 04/20/26, 08:26 AM
Paper page - Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Source: https://huggingface.co/papers/2502.07408
Abstract
Deep neural networks exhibit catastrophic vulnerability to minimal parameter bit flips across multiple domains, which can be identified and mitigated through targeted protection strategies.
Deep Neural Networks (https://huggingface.co/papers?q=Deep%20Neural%20Networks) (DNNs) can be catastrophically disrupted by flipping only a handful of parameter bits (https://huggingface.co/papers?q=parameter%20bits). We introduce Deep Neural Lesion (https://huggingface.co/papers?q=Deep%20Neural%20Lesion) (DNL), a data-free and optimization-free method that locates critical parameters, and an enhanced single-pass variant, 1P-DNL (https://huggingface.co/papers?q=1P-DNL), that refines this selection with one forward and backward pass on random inputs. We show that this vulnerability spans multiple domains, including image classification, object detection (https://huggingface.co/papers?q=object%20detection), instance segmentation (https://huggingface.co/papers?q=instance%20segmentation), and reasoning large language models. In image classification, flipping just two sign bits (https://huggingface.co/papers?q=sign%20bits) in ResNet-50 (https://huggingface.co/papers?q=ResNet-50) on ImageNet (https://huggingface.co/papers?q=ImageNet) reduces accuracy by 99.8%. In object detection (https://huggingface.co/papers?q=object%20detection) and instance segmentation (https://huggingface.co/papers?q=instance%20segmentation), one or two sign flips in the backbone collapse COCO detection and mask AP for Mask R-CNN (https://huggingface.co/papers?q=Mask%20R-CNN) and YOLOv8-seg (https://huggingface.co/papers?q=YOLOv8-seg) models. In language modeling (https://huggingface.co/papers?q=language%20modeling), two sign flips into different experts reduce Qwen3-30B-A3B-Thinking (https://huggingface.co/papers?q=Qwen3-30B-A3B-Thinking) from 78% to 0% accuracy. We also show that selectively protecting a small fraction of vulnerable sign bits (https://huggingface.co/papers?q=sign%20bits) provides a practical defense against such attacks.
View arXiv page (https://arxiv.org/abs/2502.07408) View PDF (https://arxiv.org/pdf/2502.07408) Project page (https://mkimhi.github.io/DNL/) GitHub0 (https://github.com/IdoGalil/maximal-brain-damage) Add to collection (https://huggingface.co/login?next=%2Fpapers%2F2502.07408)
Get this paper in your agent:
hf papers read 2502.07408
Don’t have the latest CLI? curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2502.07408 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2502.07408 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2502.07408 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to a collection (https://huggingface.co/new-collection) to link it from this page.
Similar Articles
Adversarial attacks on neural network policies
OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.
Are Flat Minima an Illusion?
This paper challenges the common belief that flat minima cause better generalization in neural networks, arguing that 'weakness'—a reparameterization-invariant measure of function simplicity—is the true driver. Empirical results on MNIST and Fashion-MNIST show that weakness predicts generalization while sharpness anticorrelates, and the large-batch generalization advantage vanishes as training data increases.
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
Researchers identify two distinct failure modes in aggressive LLM quantization—Signal Degradation and Computation Collapse—and show that training-free fixes only remedy the former, indicating structural reconstruction is needed for ultra-low-bit models.
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack
MedFocusLeak introduces the first transferable black-box adversarial attack on medical vision-language models, using imperceptible background perturbations to mislead clinical diagnoses across six imaging modalities.
Understanding neural networks through sparse circuits
OpenAI researchers present methods for training sparse neural networks that are easier to interpret by forcing most weights to zero, enabling the discovery of small, disentangled circuits that can explain model behavior while maintaining performance. This work aims to advance mechanistic interpretability as a complement to post-hoc analysis of dense networks and support AI safety goals.