Neural Variability Enhances Artificial Network Robustness
Summary
This paper investigates how correlated noise, inspired by neural variability in the brain, can enhance the robustness of artificial neural networks against adversarial attacks and naturalistic image modifications.
View Cached Full Text
Cached at: 06/15/26, 09:07 AM
# Neural Variability Enhances Artificial Network Robustness
Source: [https://arxiv.org/html/2606.13801](https://arxiv.org/html/2606.13801)
Praveen VenkateshAllen Institute, Seattle, WA 98195Stefan MihalasAllen Institute, Seattle, WA 98195Kameron Decker HarrisDepartment of Computer Science, Western Washington University, Bellingham, WA 98225Corresponding author:harri267@wwu\.edu
\(June 11, 2026\)
###### Abstract
Neural responses in cortex exhibit substantial trial\-to\-trial variability in response to repeated stimuli, while peripheral sensory neurons respond far more consistently, leading many to wonder whether stochasticity may carry meaning\. Existing work has argued that noise and signal correlations may be optimized for discrimination in animals, whereas artificial neural network \(ANN\) studies have shown similar benefits of noise in machine learning tasks, although most ANN work has neglected the effects of correlations\. Here we investigate whether correlated noise improves the robustness of artificial neural networks to adversarial attacks and naturalistic image modifications\. Using the covariance of activations under modified versus clean inputs, we find that structured noise may significantly improve network robustness\. Robustness to naturalistic image modifications benefits most from structure, but this structure transfers poorly across modification types\. In contrast, noise structure from adversarial attacks can generalize to other kinds of attacks\. These results suggest that structured noise in ANN activations generally improves robustness, establishing a biologically plausible strategy for creating robust artificial neural networks that only relies on local information\.
## 1Introduction
The brain is noisy, but whether this is beneficial or captures Bayesian notions of uncertainty is a matter of debate\[[14](https://arxiv.org/html/2606.13801#bib.bib2),[11](https://arxiv.org/html/2606.13801#bib.bib7),[18](https://arxiv.org/html/2606.13801#bib.bib8)\]\. At the same time, stochastic artificial neural networks \(ANNs\) have been explored for tractable theory\[[1](https://arxiv.org/html/2606.13801#bib.bib12),[9](https://arxiv.org/html/2606.13801#bib.bib13)\], Bayesian inference\[[16](https://arxiv.org/html/2606.13801#bib.bib15)\], and improved generalization\[[21](https://arxiv.org/html/2606.13801#bib.bib18)\]or robustness\[[25](https://arxiv.org/html/2606.13801#bib.bib19)\]\.
ANNs are sensitive to perturbations that would not fool an animal\. These can be crafted with or without knowledge of the model weights, known as white or black box attacks\.\[[25](https://arxiv.org/html/2606.13801#bib.bib19)\]proposed adding Gaussian noise to network inputs to defend against these attacks, and this can be certified\[[2](https://arxiv.org/html/2606.13801#bib.bib11)\]\. Parametric noise injection \(PNI\) adds noise to either activations or weights with diagonal noise covariance\[[19](https://arxiv.org/html/2606.13801#bib.bib5)\], while colored noise injection \(CNI\) allows for covariance structure \(low\-rank plus diagonal\) learned via backpropagation and was applied to the weights\[[26](https://arxiv.org/html/2606.13801#bib.bib16)\]\.\[[4](https://arxiv.org/html/2606.13801#bib.bib6)\]showed that unlearned V1\-like features combined with stochastic spiking also protected in image classification tasks, which could be explained by the interaction of representation geometry and noise\[[3](https://arxiv.org/html/2606.13801#bib.bib20)\]\.
Correlations have been used to explain diverse phenomena in neuroscience\[[5](https://arxiv.org/html/2606.13801#bib.bib21)\], and there are even models of optimal correlations, e\.g\. thesign rulewhich states that optimal noise and signal correlations have opposite signs\[[8](https://arxiv.org/html/2606.13801#bib.bib22)\]\. Motivated by recordings of noisy activity in mouse V1,\[[23](https://arxiv.org/html/2606.13801#bib.bib1)\]hypothesized that optimal noise would have a covariance that shrinks perpendicular to the optimal decision boundary \(as in the sign rule\), with the greatest variance in task\-irrelevant directions\. This hypothesis was supported by analysis of mouse V1 data, and they implemented a neural network with a noisy layer and injected noise with covariance structure defined by rotations of some images and observed that it provides robustness to rotations of other images, which we expand upon here\. We were motivated by this work and its neural relevance to test structured noise derived from other kinds of modifications\.
In this paper, we ask whether injecting structured noise into a neural network yields greater robustness than unstructured noise\. We hypothesize that noise where the covariance structure is derived directly from intermediate layer activations, rather than being learned \(as in PNI/CNI\), may improve robustness \(Fig\.[1](https://arxiv.org/html/2606.13801#S1.F1)\)\. If structured noise improves robustness in ANNs, this lends additional strength to the idea that the structured noise observed in the brain serves a purpose\. As in\[[23](https://arxiv.org/html/2606.13801#bib.bib1)\], we use Gaussian noise with a covariance calculated from the difference in model activations at a given layer with clean vs\. modified inputs\. We selected Gaussian noise because it is a simple model of noise with correlations\. In our approach, noise has the greatest variability in the directions that are most affected by the given modification\. We chose this technique because increasing the variance in directions that the model should be invariant to pads the margin, while lowering variance in directions relevant for classification avoids overlap between representations of different classes\. Training on clean data with these noisy representations adjusts the decision boundary to be invariant to class\-irrelevant data, without creating confusion between classes\. Our method depends on only layer\-specific activations and is biologically plausible, since Hebbian mechanisms may shape noise covariance\[[20](https://arxiv.org/html/2606.13801#bib.bib31),[7](https://arxiv.org/html/2606.13801#bib.bib30)\]\.
Figure 1:Noise structure can improve robustness\.A\) Representation of two classes separated by a learned decision boundary\. The vertical dimension isn’t relevant to the task, but due to limited data the decision boundary varies over this dimension\. B\) The classifier perfectly separates clean training data but is not robust to modifications such as adversarial attacks that move samples across the boundary\. C\) Structured noise \(multivariate Gaussian, depicted as ellipses\) is fit to the adversarial perturbations, and the classifier is retrained on the noisy data\. Noiseless representations are drawn in the center of the ellipses\. D\) The retrained decision boundary has a larger margin and robustness against further modifications\. Unstructured noise, on the other hand, would be circular and could result in a smaller margin and worse overlap of the two classes\.
## 2Methods
### 2\.1Networks and Input Modifications
Here we detail the architecture of our networks and the ways we attack or perturb the input, generally referred to asmodifications\.
#### 2\.1\.1Architecture
Our base model was a standard neural network, based on the classic LeNet architecture\[[10](https://arxiv.org/html/2606.13801#bib.bib23)\], with 3 convolutional layers interleaved with max\-pooling followed by 3 fully\-connected layers\. We trained on the Fashion MNIST\[[24](https://arxiv.org/html/2606.13801#bib.bib24), FMNIST;\]dataset for 10 epochs usingtanh\\tanhactivations between all layers excluding the last, Adam optimizer with learning rate 0\.001, cross entropy loss, and batch size 64\. Layerℓ\\ellactivations are denotedxℓ∈ℝnℓx\_\{\\ell\}\\in\\mathbb\{R\}^\{n\_\{\\ell\}\}forℓ=0,…,6\\ell=0,\\ldots,6withx0x\_\{0\}for the input andx6x\_\{6\}the classifier logits\. In the appendix, we present similar results with a vision transformer \(ViT\) and CIFAR\-10\.
#### 2\.1\.2Adversarial Attacks
To investigate the model’s robustness, we compared its performance on data that has undergone a variety of modifications, including adversarially attacked data\. We used the Adversarial Robustness Toolbox\[[17](https://arxiv.org/html/2606.13801#bib.bib9), ART;\]to generate attacks and compared a range of attack strengthsε=0\.001\\varepsilon=0\.001to 0\.2 for each experiment\. See Table[2](https://arxiv.org/html/2606.13801#A1.T2)in the appendix for a list of attacks\. To compare our results to established methods for defending against adversarial attacks, we used ART to implement Gaussian Data Augmentation\[[25](https://arxiv.org/html/2606.13801#bib.bib19)\]and Adversarial Training\[[22](https://arxiv.org/html/2606.13801#bib.bib28)\]\. Specifically, we used Gaussian augmentation on 100% of the data during both training and evaluation and adversarial training with Projected Gradient Descent \(PGD\)\.
#### 2\.1\.3Naturalistic Modifications
We used the both the imagecorruptions package\[[15](https://arxiv.org/html/2606.13801#bib.bib27)\]and torchvision\[[13](https://arxiv.org/html/2606.13801#bib.bib25),[12](https://arxiv.org/html/2606.13801#bib.bib26)\]to create modified images using a variety of perturbations shown in Tab\.[2](https://arxiv.org/html/2606.13801#A1.T2)\. Gaussian blur was excluded since we found it did not significantly affect the models\.
For image modifications from the imagecorruptions package, corruption severity is set by an integer between 1 and 5\. This library requires that images are at least32×3232\\times 32with values ranging from 0 to 255, whereas FMNIST is28×2828\\times 28with values ranging from\[0,1\]\[0,1\]\. To solve these issues, we padded images with zeros to make them32×3232\\times 32before processing and converted them to an unsigned byte format\. After processing, we cropped them to28×2828\\times 28, converted them to a float format, converting them back to grayscale if necessary and clamping values to a range from\[0,1\]\[0,1\]\. For torchvision modifications, we selected a base value for the transformation strength and multiplied it by a scaling factor between0\.20\.2and2\.02\.0\. We also developed random obstruction: a black square is placed over a random part of the image, with its side length defined as0\.4sH0\.4sH, wheressis a scaling factor in the range0\.2≤s≤2\.00\.2\\leq s\\leq 2\.0andHHis height of the \(square\) image\.
Figure 2:Structured noise improves robustness\.Noise with a covariance derived from the base model’s responses to modified data is injected into the activations of the second convolutional layer \(L=2L=2\)\. Noise covariance settings include full covariance, diagonal covariance, identity covariance, and no noise\. \(Left\) Mean test accuracy against AutoPGD attack plotted against attack strength\. \(Center\) Mean test accuracy against range of motion blur severity\. \(Right\) Mean test accuracy against random obstruction, where a black square of a given size is placed in a random location\.
### 2\.2Noisy Layer Models
We inject noise into the activationsxLx\_\{L\}of a selected noisy layerLLof the network\. This noise can either be unstructured \(identity covariance\) or structured following a multivariate Gaussian model\. We take a fully\-trained base model and extracted the layer activations given unperturbed inputsxLx\_\{L\}and modified inputsxL′x^\{\\prime\}\_\{L\}\. From these, we computed thenL×nLn\_\{L\}\\times n\_\{L\}empirical covariance matrixCfullC\_\{\\mathrm\{full\}\}of the differencesxL′−xLx^\{\\prime\}\_\{L\}\-x\_\{L\}over all minibatches of the training set and added10−410^\{\-4\}to the diagonal for stability\.
We then add Gaussian noise with covarianceCCto the activations at layerLLusing the Cholesky decomposition\. We took either the full covarianceC=aCfullC=aC\_\{\\mathrm\{full\}\}, diagonalC=Cdiag=adiag\(Cfull\)C=C\_\{\\mathrm\{diag\}\}=a\\,\\mathrm\{diag\}\(C\_\{\\mathrm\{full\}\}\), or identityC=aIC=aI\. To ensure the strength of the noise was consistent across conditions, we multiplied the covariance matrixCCby a constantaaso that thenormalized trace, or variance per dimension,
𝔱𝔯\(C\)=1nLTr\(C\)\\mathfrak\{tr\}\(C\)=\\frac\{1\}\{n\_\{L\}\}\\mathrm\{Tr\}\(C\)\(1\)is fixed at a given scale\. “No𝔱𝔯\\mathfrak\{tr\}” means we didn’t perform trace normalization\. Then the weights of all layers up to and including layerLLwere frozen, and the later layer weights were optimized to minimize loss on the training set for 10 epochs \(Fig\.[12](https://arxiv.org/html/2606.13801#A1.F12)shows learning curves\) using unperturbed data\.
### 2\.3Experimental Setup
To compare the usefulness of the structured vs\. unstructured noise, we evaluated the robustness of different noisy models to modified inputs\. Noisy models were trained using the procedure described above and then tested by delivering manipulated images from the test set that was unseen during training\. Unless otherwise specified, we match the type and strength of image manipulations \(attacks, spatial transformations, etc\.\) used to create the covarianceCCto those during testing\. We use the second convolutional layer as the default noisy layer, although we do show results for varyingLLbelow\. To capture network and trial variability, we ran each experiment 10 times using models trained from independent initial weights and noise samples\. For experiments requiring the use of non\-deterministically modified data \(e\.g\. adversarial attacks or random perturbations\), a fresh dataset was generated for each trial\. Plot shading indicates bootstrap 95% confidence intervals\.
## 3Results
Injecting structured noise into the activations of a neural network improves the model’s robustness against adversarial attacks, partial obstructions, and naturalistic image modifications, as shown in Fig\.[2](https://arxiv.org/html/2606.13801#S2.F2)\. For adversarial attacks, while any noise offers some benefit, full covariance offers significantly better performance than identity or diagonal onceε\\varepsilonis large\. For AutoPGD withε=0\.16\\varepsilon=0\.166 and trace scale𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0, the mean performance of models with full covariance noise was 0\.55, compared to 0\.24 for models with identity covariance noise, 0\.34 for𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0diagonal covariance noise, and 0\.00 for no noise\. We found similar results for other white\-box adversarial attacks \(see Tab\.[1](https://arxiv.org/html/2606.13801#S3.T1)\)\. In contrast, the benefit of full covariance noise is significantly less pronounced for Square attack, and only appears for highestε\\varepsilon\.
Full covariance offers most pronounced benefits for motion blur \(Fig\.[2](https://arxiv.org/html/2606.13801#S2.F2)\)\. At strength 4 and𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5, models with full covariance noise had a test accuracy of 0\.64, compared to 0\.29 for both diagonal and identity noise, and 0\.30 for no noise\. In contrast with AutoPGD, identity and diagonal covariance perform the same as baseline noiseless network\. There is a similar pattern in the effects of structured noise on robustness against a wide range of naturalistic image modifications \(Tab\.[1](https://arxiv.org/html/2606.13801#S3.T1)\)\. Full covariance generally offers the best performance with identity and diagonal noise offering limited robustness to noisy modifications \(Gaussian and impulse noise, snow\), and little to no benefit for the other modifications/transformations\. One notable exception is the elastic transform, where none of the methods succeed in defending\.
### 3\.1Optimal Noise Strength for Robustness
Figure 3:Optimal noise strength depends on the image modification\.Noise strength is set by the normalized trace𝔱𝔯\\mathfrak\{tr\}\. \(Top Left\) AutoPGD Attack:𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0outperforms all smaller trace values for medium and high strength attacks\. For no𝔱𝔯\\mathfrak\{tr\}, identity noise performs best, likely because it has the largest trace before normalization\. Low, medium, and high modification strengths correspond toε\\varepsilonvalues of0\.040\.04,0\.10\.1, and0\.180\.18respectively\. \(Top Center\) Motion Blur:𝔱𝔯=0\.25\\mathfrak\{tr\}=0\.25,𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5, and𝔱𝔯=1\.0\\mathfrak\{tr\}=1\.0, provide similar performances, with no𝔱𝔯\\mathfrak\{tr\}and𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0underperforming\. Low, medium, and high modification strengths correspond to severity values of11,33, and55respectively\. \(Top Right\) Random Obstruction: No𝔱𝔯\\mathfrak\{tr\}and𝔱𝔯=0\.25\\mathfrak\{tr\}=0\.25perform slightly better for small and medium modification strengths, with𝔱𝔯=0\.25\\mathfrak\{tr\}=0\.25and𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5performing best for high strength modifications\. Low, medium, and high modification strengths correspond to scale values of0\.40\.4,1\.01\.0, and1\.81\.8respectively\. \(Bottom Row\) Larger values of𝔱𝔯\\mathfrak\{tr\}yield worse performance on clean data for all image alterations\. See Fig\.[7](https://arxiv.org/html/2606.13801#A1.F7)in the appendix for the effect of noise strength for all modification types\.The optimal value for normalized trace𝔱𝔯\\mathfrak\{tr\}varies depending on the type of image modification used\. As shown in Fig\.[3](https://arxiv.org/html/2606.13801#S3.F3), larger values of𝔱𝔯\\mathfrak\{tr\}yield better robustness against mid to highε\\varepsilonwhite\-box adversarial attacks \(AutoPGD, PGD, FGM\)\. For AutoPGD withε=0\.10\\varepsilon=0\.10, using full covariance noise with𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0yields a performance of 0\.55 95% CI \[0\.54, 0\.56\] compared to 0\.45 \[0\.43, 0\.46\] when𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5\. However, this trend does not hold for Square attack or naturalistic image modifications, where instead lower𝔱𝔯\\mathfrak\{tr\}values can result in better performance\. As an example, for motion blur with severity=3 and full covariance noise,𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0results in a performance of 0\.64 95% CI \[0\.62, 0\.66\] in comparison to 0\.70 \[0\.69, 0\.71\] when𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5\. The effect of𝔱𝔯\\mathfrak\{tr\}on robustness against all types of image modification is shown in Fig\.[7](https://arxiv.org/html/2606.13801#A1.F7)in the appendix\. Higher values of𝔱𝔯\\mathfrak\{tr\}result in a significant decrease in performance on clean data \(see Fig\.[3](https://arxiv.org/html/2606.13801#S3.F3)\)\.
### 3\.2Structure is Most Effective in Early Layers
Figure 4:Noisy layer placement affects performance\.Mean test accuracy for all noisy layersLLand noise covariances\. \(Left\) AutoPGD Attack\.L=1L=1andL=2L=2offer the best performances\. ForL=2L=2, full covariance noise yields the highest accuracy for medium and high strength attacks\. WhenL=1L=1, the difference in performances between covariances is smaller, with full covariance underperforming compared to diagonal for medium and low strength attacks\. Low, medium, and high modification strengths correspond toε\\varepsilonvalues of0\.040\.04,0\.10\.1, and0\.180\.18respectively\. \(Center\) Motion blur\.L=2L=2andL=3L=3provide highest accuracies, with full covariance noise significantly outperforming diagonal and identity noise for allLL\. Low, medium, and high modification strengths correspond to severity values of11,33, and55respectively\. \(Right\) Random obstruction\. For high strength modifications,L=1L=1andL=2L=2offer the best performance, whereas for medium strength modificationsL=2L=2andL=3L=3are preferable\. Low, medium, and high modification strengths correspond to scale values of0\.40\.4,1\.01\.0, and1\.81\.8respectively\. See Fig\.[8](https://arxiv.org/html/2606.13801#A1.F8)in the appendix for the effect of layer placement for all modification types\.The position of the noisy layer has a significant impact on performance, with the optimal noisy layer varying depending on the type and strength of image alteration\. As seen in Fig\.[4](https://arxiv.org/html/2606.13801#S3.F4), the optimal noisy layer is always one of the convolutional layers \(1–3\), with the fully connected layers \(4–5\) under\-performing\. The last fully connected layer was excluded from this experiment because adding noise to its activations is equivalent to adding noise directly to the model’s output\.
When using the AutoPGD attack with𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0, layersL=1L=1andL=2L=2yield the best results\. Full covariance noise outperforms identity and diagonal noise for allε\>0\.04\\varepsilon\>0\.04withL=2L=2\. WhenL=2L=2andε=0\.10\\varepsilon=0\.10, full covariance noise provides a performance of 0\.54 95% CI \[0\.53, 0\.55\], compared to 0\.37 \[0\.35, 0\.38\] for diagonal and 0\.26 \[0\.23, 0\.29\] for identity\. However, things change whenL=1L=1: Full covariance is best only at the highest attack strength \(ε=0\.2\\varepsilon=0\.2\), with diagonal noise generally yielding the best accuracy whenε<0\.2\\varepsilon<0\.2\. ForL=1L=1andε=0\.10\\varepsilon=0\.10, full covariance noise yields an accuracy of 0\.49 95% CI \[0\.47, 0\.50\], diagonal noise 0\.56 \[0\.55,0\.57\], and identity noise 0\.50 \[0\.44,55\]\.
For motion blur with𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5,L=3L=3provides the best performance withL=2L=2yielding the same or slightly lower accuracies\. As shown in Fig\.[4](https://arxiv.org/html/2606.13801#S3.F4), full covariance noise consistently provides significantly better performance than identity or diagonal noise, regardless of the noisy layer selected\. The effect ofLLon robustness against all types of image modification is shown in Fig\.[8](https://arxiv.org/html/2606.13801#A1.F8)in the appendix\.
#### 3\.2\.1Gaussian Augmentation
We compared our method to to Gaussian data augmentation\[[25](https://arxiv.org/html/2606.13801#bib.bib19)\], which adds unstructured noise to the input \(equivalent to identity noise atL=0L=0\)\. The standard deviation of the noise is set byσ\\sigma\. Note that we only evaluate the effects of Gaussian augmentation against adversarial attacks, excluding naturalistic image modifications\. We found that for low to moderate attack strengths, Gaussian noise with a standard deviation of0\.450\.45provided comparable performance to structured noise in later layers, however full covariance noise atL=2L=2offered better performance against high strength attacks \(see supplemental Fig\.[9](https://arxiv.org/html/2606.13801#A1.F9)\)\. When augmentation withσ=0\.25\\sigma=0\.25is combined with full covariance noise with𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0injected atL=2L=2, performance improves compared to either method alone, particularly against high strength attacks\. See supplemental Fig\.[10](https://arxiv.org/html/2606.13801#A1.F10)for details\.
### 3\.3Noise Covariance Transferability
Figure 5:Noise covariances from many sources offer similar adversarial robustness\.Each subplot shows how models with a variety of noise covariances perform against a different attack over a range of attack strengths \(ε\{\\varepsilon\}\)\. Across all subplots, the difference in performance between white box attack \(AutoPGD, PGD, and FGM\) covariances is negligible\. \(Top Left\) AutoPGD attack\. Full covariance noise provides the best accuracy for allε\>0\.04\\varepsilon\>0\.04\. Gaussian noise covariance provides slightly lower accuracies whenε<0\.8\\varepsilon<0\.8, but the highest accuracy by a narrow margin whenε\>0\.12\\varepsilon\>0\.12\. \(Top Right\) PGD Attack\. Similar to AutoPGD, with a smaller difference in performance between models with a Gaussian noise covariance and white box attack covariances\. \(Bottom Left\) FGM Attack\. Gaussian noise covariance causes models to slightly underperform against lowε\\varepsilonattacks and does not provide a significant benefit for largerε\\varepsilonattacks\. \(Bottom Right\) Square Attack\. For high values ofε\\varepsilon, full covariance noise derived from the Square attack yields the best result\. Diagonal noise from any source and identity noise offering better performance for lower values ofε\\varepsilon\. Models with Gaussian noise covariance are less robust against this attack\.The robustness gained from noise with a given covariance is not limited to only the attack used to generate the covariance\. Models with noise covariances derived from white box attacks \(AutoPGD, PGD, and FGM\) have remarkably similar performance against all tested adversarial attacks, as seen in Fig\.[5](https://arxiv.org/html/2606.13801#S3.F5)\. When evaluating robustness against white\-box attacks, models with a covariance derived from a black\-box attack \(Square attack\) perform similarly for lower values ofε\\varepsilon, but at higher values their performance is notably lower\. Injecting noise with a covariance derived from the base model’s activations in response to images augmented with Gaussian noise provides comparable adversarial robustness, particularly against white\-box attacks\. For white\-box attacks, models with Gaussian noise source covariance slightly underperform compared to adversarial source covariances for lower values ofε\\varepsilon\. For AutoPGD withε=0\.04\\varepsilon=0\.04and full covariance noise, Gaussian noise covariance results in 0\.69 95% CI \[0\.68, 0\.69\] accuracy compared to 0\.72 \[0\.71, 0\.72\] for AutoPGD covariance\. The inverse is true for higher values ofε\\varepsilonagainst AutoPGD and PGD attacks: Although the effect is small, Gaussian noise covariance models are the most robust\. For AutoPGD, atε=0\.18\\varepsilon=0\.18and full covariance noise, Gaussian noise covariance yields an accuracy of 0\.33 95% CI \[0\.32, 0\.35\] compared to 0\.25 \[0\.23, 0\.26\] for AutoPGD covariance\.
Noise covariances derived from other naturalistic image modifications also provide some level of adversarial robustness, but Gaussian noise is the only one that results in comparable performance to adversarially derived covariances \(not shown\)\. Impulse noise and snow provide lesser benefit, while others provide little to none\.
As shown in Fig˙[5](https://arxiv.org/html/2606.13801#S3.F5), model performance is largely determined by whether full covariance or diagonal covariance was used\. Full covariance provides the best robustness at higher values ofε\\varepsilon, but diagonal noise offers the best robustness to lowε\\varepsilonattacks\. Identity and diagonal noise covariances result in very similar performance against lowε\\varepsilonattacks, but asε\\varepsilonincreases, identity noise becomes decreasingly effective\.
We evaluated whether robustness gains from structured noise persist when the noise covariance is derived from a different naturalistic image modification than the one used at evaluation time \(Fig\.[6](https://arxiv.org/html/2606.13801#A1.F6)\)\. In contrast to adversarial attacks, robustness to these modifications is largely non\-transferable\. For most naturalistic modifications, mismatched noise covariances provide little to no benefit over models with no injected noise\. Several corruptions exhibit strong task specificity including random obstruction, rotation, motion blur, and perspective transformations\. A limited degree of transferability exists among corruptions that add noise or modify the brightness of the image, with Gaussian and impulse noise the most compatible\. Similarly, brightness\-derived covariance provided the best robustness to changes in brightness, with snow\-derived covariance also offering substantial robustness and all other covariance sources providing little to no benefit\.
Table 1:Typical results for all modifications, moderate image modification strength\.Accuracy is presented as mean and 95% confidence intervals for different noisy models with the highest mean accuracies in bold\. Generally, the best accuracy results from using the full covariance\.
### 3\.4Comparison to Adversarial Training
To compare the effectiveness of our model to standard adversarial training, we used AdversarialTrainer with PGD from the ART library\. We found that standard adversarial training offers more adversarial robustness than injecting structured noise, with lower ratios of adversarial examples in the training data resulting in reduced robustness \(Fig\.[11](https://arxiv.org/html/2606.13801#A1.F11)\)\. Against PGD withε=0\.1\\varepsilon=0\.1, adversarially trained models with ratio 0\.8 had a mean accuracy of0\.74±0\.010\.74\\pm 0\.01compared to0\.57±0\.010\.57\\pm 0\.01for a full covariance model withL=2L=2andtr=2\.0tr=2\.0\.
## 4Discussion
Across a range of experiments on a convolutional network \(and ViT; see appendix\), structured noisy layers yield greater robustness than unstructured noise\. This demonstrates that the geometry of the noise relative to the network’s representations of the data has a significant impact on robustness\.
Full covariance noise generally confers greater robustness than diagonal and identity noise for moderate to high strength image modifications, with the exception of the elastic transformation\. Another exception is when noise is injected into the earliest convolutional layer; here diagonal and identity noise yield better adversarial robustness than full covariance noise for low to moderate strength adversarial attacks\. It’s possible the attacks tend to target individual neurons rather than subspaces, and that we can defend against these vulnerable neurons with larger variance; however, we did not directly test this hypothesis\. By later layers, full covariance noise is best\. For non\-adversarial modifications, full covariance noise still generally yields the best performance, compared to diagonal and identity, across all layer choices\. Many of these modifications \(e\.g\. elastic, perspective\) are nonlinear effects on the image space that could potentially be linearized in later layers and thus more easily defended against in those locations\. Overall, our results show the most significant benefits of noise in the early layers\.
One could make the argument that noise being more beneficial in early layers contradicts observations from biological neural networks, where activity in peripheral neurons is significantly less variable than in cortical neurons\. However, this assumes that early layers of a CNN can be used as an analogue for peripheral neural circuitry\. Perhaps it is more reasonable to view the CNN in its entirety as a cortical neural circuit, where the periphery corresponds to the model input\.
Noise strength, set by𝔱𝔯\(C\)\\mathfrak\{tr\}\(C\), significantly impacts robustness\. Smaller values of𝔱𝔯\\mathfrak\{tr\}tend to be most effective, with the exception of white\-box adversarial attacks where a larger𝔱𝔯\\mathfrak\{tr\}yields better robustness\. This suggests a trade\-off between injecting sufficiently strong noise to counter white\-box adversarial perturbations and preserving task\-relevant signal under more naturalistic distortions\.
When evaluating robustness to adversarial attacks, diagonal and identity noise offer better robustness at low attack strengths\. This may be because the extra parameters contained in a full covariance matrix are unnecessary for robustness to low strength attacks\. It is possible that full covariance noise overfits to low strength attacks, decreasing performance\. Diagonal noise tends to yield significantly greater robustness than identity noise, with exceptions for some modifications at low strengths and for motion blur and rotation at all strengths\. A diagonal covariance enables the noise strength to vary by neuron, whereas identity noise results in the same strength across all neurons\. When diagonal noise does not offer an advantage, it indicates that there is no benefit to allowing the noise to vary by neuron without correlations between neurons, which can only be provided by a full covariance matrix\.
No single covariance source provides robust performance across all modifications, indicating that our technique for generating structured noise does not encode a universal invariance, but instead makes the model resilient to a limited set of related perturbations\.
Compared to Gaussian augmentation, structured noise provides more robustness to high strength adversarial attacks\. When we combine Gaussian augmentation with our method, we see better performance than for either method alone, at least with the tested parameters\. We did not evaluate Gaussian augmentation against naturalistic modifications, however we expect that it would have little to no benefit\. It would be equivalent to adding identity noise to the inputs, and identity noise tends to be ineffective against non\-adversarial modifications\.
There are important limitations to our work\. While adding structured noise is a biologically plausible strategy for improving robustness, our method relies on seeing both clean an modified inputs knowing their identities\. If structured noise plays a role in perception, its strength and shape is likely the result of an interaction between local learning rules and priors optimized over evolutionary timescales, not an explicit estimation from paired clean and corrupted inputs\. Local learning rules leading to robust noise structure is an interesting direction for future work\. Our method is outperformed by standard adversarial training \(Fig\.[11](https://arxiv.org/html/2606.13801#A1.F11)\) when evaluating robustness against adversarial attacks\. For low to medium values ofε\\varepsilon, Gaussian augmentation also generally provides better protection against attacks\. Our approach should be viewed as a proof of concept that uses a simple mechanism for demonstrating that appropriately structured noise can confer robustness, rather than an optimal strategy for discovering such covariances\. Further, we do not test the effects of multiple noisy layers or a large array of architectures or datasets \(e\.g\. ResNets, although we did try ViT on CIFAR\-10\)\. It would be interesting to consider adding noise to particular elements of the transformer module, for instance mimicking a “stochastic” attention mechanism\. We do not consider realistic neuronal noise such as would be found with stochastic spiking models\. The loss in performance on clean data when structured noise is injected suggests that we have not found the ideal shape, where the variance of the noise is minimized perpendicular to the decision boundary\[[23](https://arxiv.org/html/2606.13801#bib.bib1)\]\.
Future work could explore whether effective noise covariances can be obtained via a biologically plausible methods such as Hebbian\-style rules combined with stochastic neurons\[[7](https://arxiv.org/html/2606.13801#bib.bib30),[20](https://arxiv.org/html/2606.13801#bib.bib31)\]\. One could also expose the model to a mixture of diverse image modifications, rather than a single type of modification, or consider multiple noisy layers and see if further benefits are possible\. Additionally, evaluating robustness against a broader range of black\-box attacks would help clarify whether the observed behavior of Square attacks is an outlier or indicative of a distinct pattern between white\-box vs\. black\-box attacks\. It would also be interesting to investigate whether the most effective layers for noise injection correspond to those with the largest differences in activations or where the attacks directions can be most separated from signal by subspaces\.
We have shown that structured stochasticity in internal representations can substantially improve robustness in artificial neural networks\. By injecting noise with a covariance derived from differences in activations between clean and modified inputs, we show that structured noise outperforms unstructured noise across a wide range of adversarial attacks and naturalistic image modifications, with its effectiveness depending on noise strength and the layer at which it is applied\. While robustness to naturalistic modifications is largely task\-specific, the strong transferability of adversarially derived covariances suggests that these perturbations share common structure\. Our results support the view that stochasticity in activations encourages robust representations, drawing a computational parallel to the stochasticity observed in the brain\.
## References
- \[1\]L\. Chizat, E\. Oyallon, and F\. Bach\(2018\-12\)On Lazy Training in Differentiable Programming\.arXiv:1812\.07956 \[cs, math\]\.Note:arXiv: 1812\.07956External Links:[Link](http://arxiv.org/abs/1812.07956)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1)\.
- \[2\]J\. M\. Cohen, E\. Rosenfeld, and J\. Z\. Kolter\(2019\-02\)Certified Adversarial Robustness via Randomized Smoothing\.arXiv:1902\.02918 \[cs, stat\]\.Note:arXiv: 1902\.02918Comment: ICML 2019External Links:[Link](http://arxiv.org/abs/1902.02918)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p2.1)\.
- \[3\]J\. Dapello, J\. Feather, H\. Le, T\. Marques, D\. D\. Cox, J\. H\. McDermott, J\. J\. DiCarlo, and S\. Chung\(2021\-11\)Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception\.arXiv\.Note:arXiv:2111\.06979 \[q\-bio\]Comment: 35th Conference on Neural Information Processing Systems \(NeurIPS 2021\)External Links:[Link](http://arxiv.org/abs/2111.06979),[Document](https://dx.doi.org/10.48550/arXiv.2111.06979)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p2.1)\.
- \[4\]J\. Dapello, T\. Marques, M\. Schrimpf, F\. Geiger, D\. D\. Cox, and J\. J\. DiCarlo\(2020\-06\-17\)Simulating a primary visual cortex at the front of CNNs improves robustness to image perturbations\.External Links:[Link](http://biorxiv.org/lookup/doi/10.1101/2020.06.16.154542),[Document](https://dx.doi.org/10.1101/2020.06.16.154542)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p2.1)\.
- \[5\]J\. de la Rocha, B\. Doiron, E\. Shea\-Brown, K\. Josić, and A\. Reyes\(2007\-08\)Correlation between neural spike trains increases with firing rate\.Nature448\(7155\),pp\. 802–806\(en\)\.External Links:ISSN 1476\-4687,[Link](https://www.nature.com/articles/nature06028),[Document](https://dx.doi.org/10.1038/nature06028)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p3.1)\.
- \[6\]A\. Dosovitskiy, L\. Beyer, A\. Kolesnikov, D\. Weissenborn, X\. Zhai, T\. Unterthiner, M\. Dehghani, M\. Minderer, G\. Heigold, S\. Gelly, J\. Uszkoreit, and N\. Houlsby\(2021\-06\)An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale\.arXiv\(en\)\.Note:arXiv:2010\.11929 \[cs\]Comment: Fine\-tuning code and pre\-trained models are available at https://github\.com/google\-research/vision\_transformer\. ICLR camera\-ready version with 2 small modifications: 1\) Added a discussion of CLS vs GAP classifier in the appendix, 2\) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 \(relative performance of models is basically not affected\)External Links:[Link](http://arxiv.org/abs/2010.11929),[Document](https://dx.doi.org/10.48550/arXiv.2010.11929)Cited by:[§A\.3\.1](https://arxiv.org/html/2606.13801#A1.SS3.SSS1.p1.1)\.
- \[7\]J\. Eppler, T\. Lai, D\. F\. Aschauer, S\. Rumpel, and M\. Kaschube\(2026\-02\)Representational drift reflects ongoing balancing of stochastic changes by Hebbian learning\.Proceedings of the National Academy of Sciences123\(5\),pp\. e2503046123\.External Links:[Link](https://www.pnas.org/doi/10.1073/pnas.2503046123),[Document](https://dx.doi.org/10.1073/pnas.2503046123)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p4.1),[§4](https://arxiv.org/html/2606.13801#S4.p9.1)\.
- \[8\]Y\. Hu, J\. Zylberberg, and E\. Shea\-Brown\(2014\-02\)The Sign Rule and Beyond: Boundary Effects, Flexibility, and Noise Correlations in Neural Population Codes\.PLoS Computational Biology10\(2\),pp\. e1003469\(en\)\.External Links:ISSN 1553\-7358,[Link](https://dx.plos.org/10.1371/journal.pcbi.1003469),[Document](https://dx.doi.org/10.1371/journal.pcbi.1003469)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p3.1)\.
- \[9\]A\. Jacot, F\. Gabriel, and C\. Hongler\(2018\)Neural Tangent Kernel: Convergence and Generalization in Neural Networks\.InAdvances in Neural Information Processing Systems,Vol\.31\.External Links:[Link](https://papers.nips.cc/paper/2018/hash/5a4be1fa34e62bb8a6ec6b91d2462f5a-Abstract.html)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1)\.
- \[10\]Y\. Lecun, L\. Bottou, Y\. Bengio, and P\. Haffner\(1998\-11\)Gradient\-based learning applied to document recognition\.Proceedings of the IEEE86\(11\),pp\. 2278–2324\(en\)\.External Links:ISSN 00189219,[Link](http://ieeexplore.ieee.org/document/726791/),[Document](https://dx.doi.org/10.1109/5.726791)Cited by:[§2\.1\.1](https://arxiv.org/html/2606.13801#S2.SS1.SSS1.p1.6)\.
- \[11\]W\. J\. Ma, J\. M\. Beck, P\. E\. Latham, and A\. Pouget\(2006\-10\)Bayesian inference with probabilistic population codes\.Nature Neuroscience9\(11\),pp\. 1432–1438\.External Links:[Document](https://dx.doi.org/10.1038/nn1790)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1)\.
- \[12\]TorchVision: pytorch’s computer vision libraryNote:[https://github\.com/pytorch/vision](https://github.com/pytorch/vision)Cited by:[§2\.1\.3](https://arxiv.org/html/2606.13801#S2.SS1.SSS3.p1.1)\.
- \[13\]S\. Marcel and Y\. Rodriguez\(2010\-10\)Torchvision the machine\-vision package of torch\.InProceedings of the 18th ACM international conference on Multimedia,MM ’10,New York, NY, USA,pp\. 1485–1488\.External Links:ISBN 978\-1\-60558\-933\-6,[Link](https://dl.acm.org/doi/10.1145/1873951.1874254),[Document](https://dx.doi.org/10.1145/1873951.1874254)Cited by:[§2\.1\.3](https://arxiv.org/html/2606.13801#S2.SS1.SSS3.p1.1)\.
- \[14\]M\. D\. McDonnell and D\. Abbott\(2009\-05\-29\)What is stochastic resonance? definitions, misconceptions, debates, and its relevance to biology\.5\(5\),pp\. e1000348\.External Links:ISSN 1553\-7358,[Link](https://dx.plos.org/10.1371/journal.pcbi.1000348),[Document](https://dx.doi.org/10.1371/journal.pcbi.1000348)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1)\.
- \[15\]C\. Michaelis, B\. Mitzkus, R\. Geirhos, E\. Rusak, O\. Bringmann, A\. S\. Ecker, M\. Bethge, and W\. Brendel\(2019\)Benchmarking robustness in object detection: autonomous driving when winter is coming\.arXiv preprint arXiv:1907\.07484\.Cited by:[§2\.1\.3](https://arxiv.org/html/2606.13801#S2.SS1.SSS3.p1.1)\.
- \[16\]R\. M\. Neal\(1996\)Bayesian Learning for Neural Networks\.Lecture Notes in Statistics,Springer New York,New York, NY\(en\)\.External Links:ISSN 0930\-0325,[Link](http://link.springer.com/10.1007/978-1-4612-0745-0),[Document](https://dx.doi.org/10.1007/978-1-4612-0745-0)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1)\.
- \[17\]M\. Nicolae, M\. Sinn, M\. N\. Tran, B\. Buesser, A\. Rawat, M\. Wistuba, V\. Zantedeschi, N\. Baracaldo, B\. Chen, H\. Ludwig, I\. M\. Molloy, and B\. Edwards\(2019\)Adversarial robustness toolbox v1\.0\.0\.External Links:1807\.01069,[Link](https://arxiv.org/abs/1807.01069)Cited by:[§2\.1\.2](https://arxiv.org/html/2606.13801#S2.SS1.SSS2.p1.1)\.
- \[18\]G\. Orbán, P\. Berkes, J\. Fiser, and M\. Lengyel\(2016\-10\)Neural variability and sampling\-based probabilistic representations in the visual cortex\.Neuron92,pp\. 530–543\.External Links:[Document](https://dx.doi.org/10.1016/j.neuron.2016.09.038)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1)\.
- \[19\]A\. S\. Rakin, Z\. He, and D\. Fan\(2018\-11\-22\)Parametric noise injection: trainable randomness to improve deep neural network robustness against adversarial attack\.\(arXiv:1811\.09310\)\.External Links:[Link](http://arxiv.org/abs/1811.09310),[Document](https://dx.doi.org/10.48550/arXiv.1811.09310),1811\.09310 \[cs\]Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p2.1)\.
- \[20\]D\. N\. Scott and M\. J\. Frank\(2021\-11\)Beyond gradients: Noise correlations control Hebbian plasticity to shape credit assignment\.bioRxiv\(en\)\.Note:Pages: 2021\.11\.19\.466943 Section: New ResultsExternal Links:[Link](https://www.biorxiv.org/content/10.1101/2021.11.19.466943v1),[Document](https://dx.doi.org/10.1101/2021.11.19.466943)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p4.1),[§4](https://arxiv.org/html/2606.13801#S4.p9.1)\.
- \[21\]N\. Srivastava, G\. Hinton, A\. Krizhevsky, I\. Sutskever, and R\. SalakhutdinovDropout: A Simple Way to Prevent Neural Networks from Overfitting\.\(en\)\.Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1)\.
- \[22\]F\. Tramèr, A\. Kurakin, N\. Papernot, I\. Goodfellow, D\. Boneh, and P\. McDaniel\(2020\)Ensemble adversarial training: attacks and defenses\.External Links:1705\.07204,[Link](https://arxiv.org/abs/1705.07204)Cited by:[§2\.1\.2](https://arxiv.org/html/2606.13801#S2.SS1.SSS2.p1.1)\.
- \[23\]P\. Venkatesh, J\. Shang, C\. C\. Bennett, S\. Gale, G\. R\. Heller, T\. K\. Ramirez, S\. Durand, E\. T\. SheaBrown, S\. R\. Olsen, and S\. Mihalas\(2024\-11\)The Role of Cortical Varibility in Supporting Few\-shot Generalization: Theory and Empirical Evidence\.\(en\)\.External Links:[Link](https://openreview.net/forum?id=2FWkTBtSWJ)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p3.1),[§1](https://arxiv.org/html/2606.13801#S1.p4.1),[§4](https://arxiv.org/html/2606.13801#S4.p8.1)\.
- \[24\]H\. Xiao, K\. Rasul, and R\. Vollgraf\(2017\-08\-28\)Fashion\-mnist: a novel image dataset for benchmarking machine learning algorithms\(Website\)External Links:cs\.LG/1708\.07747Cited by:[§2\.1\.1](https://arxiv.org/html/2606.13801#S2.SS1.SSS1.p1.6)\.
- \[25\]V\. Zantedeschi, M\. Nicolae, and A\. Rawat\(2017\-11\)Efficient Defenses Against Adversarial Attacks\.InProceedings of the 10th ACM Workshop on Artificial Intelligence and Security,AISec ’17,New York, NY, USA,pp\. 39–49\.External Links:ISBN 978\-1\-4503\-5202\-4,[Link](https://dl.acm.org/doi/10.1145/3128572.3140449),[Document](https://dx.doi.org/10.1145/3128572.3140449)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p1.1),[§1](https://arxiv.org/html/2606.13801#S1.p2.1),[§2\.1\.2](https://arxiv.org/html/2606.13801#S2.SS1.SSS2.p1.1),[§3\.2\.1](https://arxiv.org/html/2606.13801#S3.SS2.SSS1.p1.7)\.
- \[26\]E\. Zheltonozhskii, C\. Baskin, Y\. Nemcovsky, B\. Chmiel, A\. Mendelson, and A\. M\. Bronstein\(2020\-03\)Colored Noise Injection for Training Adversarially Robust Neural Networks\.arXiv\(en\)\.Note:arXiv:2003\.02188 \[cs\]External Links:[Link](http://arxiv.org/abs/2003.02188),[Document](https://dx.doi.org/10.48550/arXiv.2003.02188)Cited by:[§1](https://arxiv.org/html/2606.13801#S1.p2.1)\.
## Appendix AAppendix
### A\.1Contributions
RP wrote the code, performed experiments, and created the tables and figures\. KDH performed the ViT experiments and analysis\. PV provided code that was used to start the project\. RP, PV, SM, and KDH conceived of the project\. SM edited, and RP and KDH wrote the paper\. KDH supervised the project\.
We are grateful to anonymous CCN 2026 reviewers for helpful suggestions, including the ViT architecture and idea of testing whether the layer with largest deviations under modification is the most effective noisy layer\.
### A\.2LLM Use
LLM\-enabled software \(Antigravity editor \+ Gemini 3 Flash\) were used to assist with coding the ViT experiments and analysis\.
### A\.3Supplemental Data and Experiments
Additional figures and tables are provided here\. These are also referenced in the main text, but for completeness we detail them in this appendix\.
Table[2](https://arxiv.org/html/2606.13801#A1.T2)lists the image modifications and their parameters that were used\. We also show an example of the modified images for different attack strengths, corruption severity, and modification scale\.
Fig\.[6](https://arxiv.org/html/2606.13801#A1.F6)shows how accuracy varies when the covariance source does not match the applied modification, excluding adversarial modifications\. The heatmap shows that, aside from a few modifications such as Gaussian and impulse noise, most of the covariances do not transfer well\.
Fig\.[7](https://arxiv.org/html/2606.13801#A1.F7)is a companion to Fig\.[3](https://arxiv.org/html/2606.13801#S3.F3), showing the effect of noise strength𝔱𝔯\\mathfrak\{tr\}on accuracy for all modifications and including 95% confidence interval shading \(n=10n=10for each data point\)\. We see that, across all modifications, larger𝔱𝔯\\mathfrak\{tr\}leads to higher accuracy at large modification strength, while lower levels of noise are more effective with naturalistic modifications\.
Fig\.[8](https://arxiv.org/html/2606.13801#A1.F8)is a companion to Fig\.[4](https://arxiv.org/html/2606.13801#S3.F4)that shows how noisy layer position \(shown by color\) affects accuracy across modification strength for all modifications\. For the adversarial attacks, earlier layers tend to lead to increased accuracy over later layers\. For other modifications, this effect is less pronounced\.
Fig\.[9](https://arxiv.org/html/2606.13801#A1.F9)compares the effect of injecting noise to adding unstructured noise directly to the input image \(known as Gaussian augmentation or randomized smoothing\)\. For simplicity, just adversarial modifications are shown\. At larger modification strengths, structured noise added to the layers \(our method, blue\) leads to higher accuracy\.
Fig\.[10](https://arxiv.org/html/2606.13801#A1.F10)shows how combining noise injection into later layers can be combined with Gaussian augmentation \(σ=0\.25\\sigma=0\.25\) at the image layer to defend against adversarial attacks\. While the ordering of the methods depends on attack and noise strengths, we see that combined noise in input and later layers \(solid lines\) may outperform noise in the input alone \(dashed black\) and in a later layer alone \(dash\-dotted blue\)\.
Fig\.[11](https://arxiv.org/html/2606.13801#A1.F11)compares adding noise to later layers with standard adversarial training, where the training examples and attacks were generated with PGD\. We see that adversarial training is generally the most effective way to defend against adversarial attacks\. This is not unexpected, since it tunes all parameters of the model to defend against the adversarial data distribution\. On the other hand, shaping the noise in neural layers is simpler, could rely on less data, and could reflect mechanisms active in the brain that depend on only local information about layer activations\.
Fig\.[12](https://arxiv.org/html/2606.13801#A1.F12)shows learning curves for the base models and noisy models across covariance sources\. These curves indicate that 10 epochs of re\-training is sufficient for the noisy models to converge\. We found that the loss for full covariance, diagonal, and identity noise is remarkably similar when the covariance source is a naturalistic modification, however models with full covariance noise have significantly higher loss when the covariance is derived from an adversarial attack\. Models with no noise \(noise covariance matrix contains all zeros\) that undergo the process of post ”noisy layer” retraining have a slightly lower loss than the base model\.
#### A\.3\.1Vision Transformer Experiments
To highlight the effect of neural variability in different model architectures and data, we applied the same methodology in a vision transformer\[[6](https://arxiv.org/html/2606.13801#bib.bib29), ViT;\]classifying the CIFAR\-10 tiny images dataset\. Our code was built from[example code](https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/huggingface_notebook.ipynb)provided by the ART package\. The pretrainedWinKawaks/vit\-tiny\-patch16\-224model was loaded from Huggingface and finetuned on CIFAR\-10 for 2 epochs, reaching accuracy\>\>98%\. Attacks were generated using the training split, degrading the undefended model accuracy to a few percent depending on attack strength\. We took the output of the first transformer block as the noisy layer \(vit\.encoder\.layer\.0\.output\.dropout\)\. Activation deviations were averaged over tokens/patch dimension to compute the covariance matrix, which had its trace rescaled as described in the main text\.
Results are shown in Fig\.[13](https://arxiv.org/html/2606.13801#A1.F13)for two attack strengths \(ε=4/255\\varepsilon=4/255and8/2558/255\) for𝔱𝔯=0\.5,1,2,4,8\\mathfrak\{tr\}=0\.5,1,2,4,8using identity, diagonal, and full covariance noise\. As with the convolutional network, structured noise may improve robustness on both clean and noisy data, and the optimal noise strength is larger for larger attack strength\. We did not investigate the effect of noisy layer location, other kinds of image modifications, transferability, or comparisons with adversarial training\.
Image Modifications from Adversarial Robustness Toolbox𝜺\\bm\{\\varepsilon\}=0\.02𝜺\\bm\{\\varepsilon\}=0\.10𝜺\\bm\{\\varepsilon\}=0\.20AutoProjectedGradientDescent\(eps=ε\\varepsilon\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/AutoPGD_0.02.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/AutoPGD_0.1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/AutoPGD_0.2.png)FastGradientMethod\(eps=ε\\varepsilon\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/FGM_0.02.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/FGM_0.1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/FGM_0.2.png)ProjectedGradientDescent\(eps=ε\\varepsilon\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/PGD_0.02.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/PGD_0.1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/PGD_0.2.png)SquareAttack\(eps=ε\\varepsilon\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Square_0.02.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Square_0.1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Square_0.2.png)Image Modifications from imagecorruptionseverity=1severity=3severity=5corrupt\(corruption\_name=“brightness”, severity=severity\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/brightness_1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/brightness_3.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/brightness_5.png)corrupt\(corruption\_name=“contrast”, severity=severity\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/contrast_1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/contrast_3.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/contrast_5.png)corrupt\(corruption\_name=“gaussian\_noise”, severity=severity\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/gaussian_noise_1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/gaussian_noise_3.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/gaussian_noise_5.png)corrupt\(corruption\_name=“impulse\_noise”, severity=severity\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/impulse_noise_1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/impulse_noise_3.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/impulse_noise_5.png)corrupt\(corruption\_name=“motion\_blur”, severity=severity\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/motion_blur_1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/motion_blur_3.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/motion_blur_5.png)corrupt\(corruption\_name=“snow”, severity=severity\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/snow_1.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/snow_3.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/snow_5.png)Image Modifications from torchvision\.transforms\.v2scale=0\.2scale=1\.0scale=2\.0ElasticTransform\(alpha = 35\.0 \* scale\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Elastic_0.2.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Elastic_1.0.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Elastic_2.0.png)RandomPerspective\(distortion\_scale = 0\.25 \* scale\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Perspective_0.2.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Perspective_1.0.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Perspective_2.0.png)RandomRotation\(rotation = 0\.60 \* scale\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Rotate_0.2.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Rotate_1.0.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/Rotate_2.0.png)Image Modifications by the Authorsscale=0\.2scale=1\.0scale=2\.0Obstruction\(scale, base\_fraction = 0\.40\)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/RandObstructionB_0.2.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/RandObstructionB_1.0.png)![[Uncaptioned image]](https://arxiv.org/html/2606.13801v1/examples/RandObstructionB_2.0.png)Table 2:Image modification implementations and examples\.Figure 6:Non\-adversarial noise covariances have limited transferability\.Mean test accuracies for model with full covariance noise \(L=2L=2,𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5\) over 10 runs; 95% confidence intervals for all test accuracies are within±0\.05\\pm 0\.05\. Maximum strength used for all image modifications \(i\.e\. severity = 5 or scale = 2\.0\)\. Matched covariance source and modification \(diagonal blocks\) provides the best performance for all modifications, although elastic, Gaussian noise, and impulse noise have similar performance for other sources\. This is likely because structured noise confers little to no robustness to the elastic transformation and Gaussian noise and impulse noise display a high degree of cross\-compatibility\. Brightness and snow\-derived covariances also transfer well, but matched modification and covariance still provides the best average performance\.Figure 7:Optimal noise strength depends on the image modification: all modifications\.The trace scale𝔱𝔯\\mathfrak\{tr\}controls the strength of the noise; no𝔱𝔯\\mathfrak\{tr\}\) results in the lowest level\. More noise tends to provide better results for adversarial attacks, whereas for naturalistic modifications, lower levels of noise generally yield the greatest benefit\. Low/medium/high strengths areε\\varepsilonvalues of0\.050\.05/0\.100\.10/0\.150\.15\(adversarial\),0\.50\.5/1\.01\.0/1\.51\.5\(obstruction, elastic, perspective, and rotate\), or 1/3/4 \(all other modifications\)\.Figure 8:Early noisy layer placement provides better robustness: all modifications\.The position of the noisy layerL=1,2,3,4,5L=1,2,3,4,5is shown by the line color\. Shading indicates 95% confidence interval\. Injecting noise in one of the first 3 layers consistently provides better performance than injecting noise into later layers\. For adversarial attacks, low, medium, and high modification strengths correspond toε\\varepsilonvalues of0\.050\.05,0\.100\.10, and0\.150\.15respectively\. For random obstruction naturalistic image corruptions implemented with Torchvision transforms \(e\.gėlastic, perspective, and rotate\), low, medium, and high modification strengths correspond to scale values of0\.50\.5,1\.01\.0, and1\.51\.5respectively\. For all other image modifications, low, medium, and high modification strengths correspond to severity values of11,33, and55respectively\.Figure 9:Structured noise injection outperforms Gaussian augmentation at mid to high attack strengths\. Hereσ\\sigmasets the standard deviation of the Gaussian noise, with higher values providing greater robustness\. Shading indicates 95% confidence interval\. A model trained with full covariance noise injection and no Gaussian augmentation is shown for comparison, with𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5for Square attack and𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0for all others\.Figure 10:Combining noise injection with Gaussian augmentation \(σ=0\.25\\sigma=0\.25\) provides better adversarial robustness than either method on their own, particularly for highε\\varepsilonattacks\.However, the difference in performance between structured and unstructured noise is notably smaller, with structured noise only providing a clear benefit for AutoPGD and PGD at highε\\varepsilon\. Shading indicates 95% confidence interval\.Figure 11:Comparison to standard adversarial training with PGD using AdversarialTrainer from ART\.Higher ratios of adversarial examples during training tend to provide better adversarial robustness, at the cost of worse performance on clean data\. The performance of a model with no adversarial training with full covariance noise injection is shown for comparison in blue\.Figure 12:Training curves indicate that 10 epochs of re\-training is sufficient\.We train a base model, inject noise with a given covariance into the noisy layer, and retrain all following layers\. For all non\-adversarial modifications, the learning curves for full cov, diagonal, and identity noise are essentially identical\. However for adversarial attacks, models with full covariance noise have a consistently higher loss compared to diagonal and identity\. The noise strength was set to𝔱𝔯=2\.0\\mathfrak\{tr\}=2\.0for all adversarial attacks and𝔱𝔯=0\.5\\mathfrak\{tr\}=0\.5for all other modifications\. Shading indicates a 95% confidence interval\.

Figure 13:Our method applied to vision transformer classifying CIFAR\-10 data under PGD attack\.Above:ε=4/255\\varepsilon=4/255, Below:ε=8/255\\varepsilon=8/255\. Similar to our convolutional network results, accuracy on clean data degrades with increasing𝔱𝔯\\mathfrak\{tr\}, and there is a sweet spot for adversarial robustness\. For the optimal values of𝔱𝔯\\mathfrak\{tr\}at either attack strength \(𝔱𝔯=1\\mathfrak\{tr\}=1forε=4/255\\varepsilon=4/255;𝔱𝔯=4\\mathfrak\{tr\}=4forε=8/255\\varepsilon=8/255\), full covariance leads to significant improvements in adversarial accuracy\. Significance results follow Mann\-Whitney U tests with \* indicatingp<0\.05p<0\.05; all data aren=4n=4independently fine\-tuned runs \(randomness arises from minibatch sampling and noise injection\)\.Similar Articles
Transfer of adversarial robustness between perturbation types
Researchers study how adversarial robustness transfers across different perturbation types in deep neural networks, evaluating 32 attacks of 5 types on ImageNet models. Results show that robustness to one perturbation type doesn't always transfer to others and may sometimes hurt robustness elsewhere.
Testing robustness against unforeseen adversaries
OpenAI researchers developed a method to evaluate neural network robustness against unforeseen adversarial attacks, introducing a new metric called UAR (Unforeseen Attack Robustness) that assesses model performance against unanticipated distortion types beyond the commonly studied Lp norms.
Robust adversarial inputs
Researchers demonstrated adversarial images that reliably fool neural network classifiers across multiple scales and perspectives, challenging assumptions about the robustness of multi-scale image capture systems used in autonomous vehicles.
Learning from almost nothing: How neural networks survive heavy input corruption
This paper investigates how neural networks maintain high accuracy even when over 90% of input features are corrupted, deriving a centroid-based decision rule in the high-noise limit using a mean-field approach.
Adversarial attacks on neural network policies
OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.