Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics
Summary
This paper investigates adversarial robustness in Fuzzy ARTMAP, a streaming neural architecture, by introducing WB-Softmax as a mechanism-aligned white-box attack surrogate. It evaluates progressive training and selective updating strategies to improve robustness without data replay, while also offering interpretable diagnostics for structural failures.
View Cached Full Text
Cached at: 05/11/26, 07:03 AM
# Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics
Source: [https://arxiv.org/html/2605.06902](https://arxiv.org/html/2605.06902)
###### Abstract
Adversarial robustness has been studied extensively for offline deep networks, but much less is known about how attacks, defenses, and reliability signals behave in neural learners that update through strict single\-pass streaming\. This paper studies this problem in Fuzzy ARTMAP, an Adaptive Resonance Theory architecture whose decisions are governed by winner\-take\-all category competition, complement coding, match tracking, and replay\-free prototype updates\. We introduce WB\-Softmax, a differentiable white\-box attack surrogate aligned with ARTMAP’s category\-competition and map\-field prediction mechanism, and we formalize a streaming evaluation principle requiring robustness to be assessed on the final deployed model rather than on stale intermediate states\. We further examine replay\-free adversarial training under streaming\-compatible protocol choices, including offline versus online attack generation, selective updating, and progressive training\. Across four image benchmarks, WB\-Softmax provides a strong adaptive white\-box evaluator, achieving 89–100% attack success on vanilla Fuzzy ARTMAP models across the evaluated benchmarks\. We show that defense rankings can reverse across evaluation protocols: offline adversarial training may appear strong under transfer attacks yet collapse under adaptive white\-box evaluation, whereas progressive two\-stage selective training achieves the strongest overall replay\-free robustness\. Finally, we show that ART’s explicit category geometry supports interpretable diagnosis of structural and reliability failures, including*separation collapse*—a failure mode in which different\-class categories become increasingly overlapping during adversarial adaptation—and a reversal in match\-score ordering after selective adversarial training\. These results establish a mechanism\-aligned, protocol\-aware framework for adversarial robustness in streaming prototype\-based learners\.
###### keywords:
Adaptive Resonance Theory , Fuzzy ARTMAP , adversarial robustness , adversarial training , incremental learning
††journal:Neural Networks\\affiliation
\[inst1\]organization=Department of Electrical and Computer Engineering, addressline=Missouri University of Science and Technology, city=Rolla, state=MO, country=USA
\\affiliation
\[inst2\]organization=Kummer Institute Center for Artificial Intelligence and Autonomous Systems \(KICAIAS\), addressline=Missouri University of Science and Technology, city=Rolla, state=MO, country=USA
\\affiliation
\[inst3\]organization=Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, addressline=Natal, RN 59078\-900, country=Brazil
## 1Introduction
Adversarial robustness is now a central requirement for neural learning systems, yet most existing methodology assumes a fixed model trained offline with repeated access to historical data\. This assumption leaves a major gap for streaming neural architectures that must learn in a single pass, update their decision structure online, and operate without replay\. In such systems, adversarial robustness is not only a question of perturbation size or attack strength; it also depends on when adversarial examples are generated, which evolving model state they target, and whether internal reliability signals remain meaningful after adaptation\.
Since adversarial examples were first documented\[[39](https://arxiv.org/html/2605.06902#bib.bib22)\], a large literature has developed stronger attacks, robust training methods, and evaluation protocols\[[18](https://arxiv.org/html/2605.06902#bib.bib23),[25](https://arxiv.org/html/2605.06902#bib.bib24),[8](https://arxiv.org/html/2605.06902#bib.bib25),[40](https://arxiv.org/html/2605.06902#bib.bib5),[14](https://arxiv.org/html/2605.06902#bib.bib30),[1](https://arxiv.org/html/2605.06902#bib.bib31),[43](https://arxiv.org/html/2605.06902#bib.bib61)\]\. This literature has also emphasized pitfalls such as gradient masking and the need for strong adaptive evaluation\[[2](https://arxiv.org/html/2605.06902#bib.bib29),[41](https://arxiv.org/html/2605.06902#bib.bib6)\], while RobustBench and recent surveys have helped standardize empirical robustness reporting and organize the broader adversarial\-learning landscape\[[13](https://arxiv.org/html/2605.06902#bib.bib32),[34](https://arxiv.org/html/2605.06902#bib.bib18),[44](https://arxiv.org/html/2605.06902#bib.bib55)\]\. However, it remains centered on offline training with repeated access to historical data\. In particular, both adversarial training and robustness evaluation usually assume either repeated optimization over the same dataset or repeated access to prior samples, assumptions that do not hold in strict single\-pass streaming\.
Recent work has begun examining the intersection of continual learning and adversarial robustness\[[22](https://arxiv.org/html/2605.06902#bib.bib41),[32](https://arxiv.org/html/2605.06902#bib.bib40),[15](https://arxiv.org/html/2605.06902#bib.bib17)\]\. At the same time, much of the broader continual\-learning literature relies on replay, episodic memory, rehearsal, distillation, or other forms of repeated optimization\[[23](https://arxiv.org/html/2605.06902#bib.bib7),[24](https://arxiv.org/html/2605.06902#bib.bib8),[11](https://arxiv.org/html/2605.06902#bib.bib9),[35](https://arxiv.org/html/2605.06902#bib.bib10)\]\. Existing robust continual\-learning methods likewise typically rely on replay, regularization, or memory\-augmented training\[[29](https://arxiv.org/html/2605.06902#bib.bib57),[3](https://arxiv.org/html/2605.06902#bib.bib43)\]\. These mechanisms are valuable in their intended settings, but they are difficult to reconcile with strict single\-pass streaming, where each sample is processed once and then discarded\. Thus, much of the current continual\-robustness literature addresses robust continual learning with memory, rather than robustness under the no\-replay constraints that motivate truly streaming models\.
Adaptive Resonance Theory \(ART\) networks\[[19](https://arxiv.org/html/2605.06902#bib.bib36),[36](https://arxiv.org/html/2605.06902#bib.bib2),[4](https://arxiv.org/html/2605.06902#bib.bib59)\]provide a particularly important setting for studying this gap\. ART networks were designed for stable incremental learning, and Fuzzy ARTMAP remains one of the most established supervised ART architectures for single\-pass classification\. Unlike conventional deep networks, Fuzzy ARTMAP predicts through explicit category competition and map\-field assignment, while learning proceeds through match tracking, category creation, and fast prototype updates\. These mechanisms make it possible to diagnose robustness failures directly from internal category geometry, but they also make standard deep\-network adversarial evaluation insufficient\.
Fuzzy ARTMAP\[[9](https://arxiv.org/html/2605.06902#bib.bib38)\], the supervised variant of ART, is especially relevant because it supports incremental classification through complement coding, match tracking, and explicit map\-field supervision\. Recent work has expanded the ART ecosystem through modular software implementations\[[28](https://arxiv.org/html/2605.06902#bib.bib54)\], deep hierarchical extensions\[[26](https://arxiv.org/html/2605.06902#bib.bib51)\], gradient\-free deep\-learning formulations inspired by ART dynamics\[[33](https://arxiv.org/html/2605.06902#bib.bib52)\], and analysis of computational trade\-offs induced by match\-tracking mechanisms\[[27](https://arxiv.org/html/2605.06902#bib.bib53)\]\. Recent robustness work on prototype\-based or nonstandard models has also considered hyperspherical prototypes, discriminative prototype learners, and metric\-learning perspectives\[[31](https://arxiv.org/html/2605.06902#bib.bib39),[21](https://arxiv.org/html/2605.06902#bib.bib20),[37](https://arxiv.org/html/2605.06902#bib.bib58),[38](https://arxiv.org/html/2605.06902#bib.bib16)\]\. However, these studies do not address ARTMAP’s particular combination of complement coding, winner\-take\-all category competition, fast one\-pass category updates, and explicit match\-based internal scores\. This leaves open a distinct question: how should adversarial robustness be defined, attacked, trained, and interpreted in Fuzzy ARTMAP under its native strict\-streaming regime?
A preliminary version of this work appeared in IJCNN 2026\[[7](https://arxiv.org/html/2605.06902#bib.bib4)\]\. This journal version extends it from an empirical robustness study to a broader mechanism\-aligned framework, adding a formal final\-model streaming evaluation principle, reliability analysis of match\-score inversion, geometry\-based diagnosis of separation collapse, separation\-aware training analysis, and derived unconditional robustness results\.
This gap is not only empirical\. Core ARTMAP operations—winner\-take\-all category competition, complement coding, and piecewise fast\-learning updates—make standard gradient\-based white\-box attacks poorly matched to the model’s actual prediction mechanism\. Consequently, robustness conclusions can be misleading unless the attack objective is explicitly aligned with ARTMAP’s competition\-and\-mapping structure\. A second challenge is specific to streaming learning itself\. In strict single\-pass training, category boundaries evolve continuously as categories are created, absorbed, or reset\. Adversarial examples generated against an earlier model snapshot can therefore become stale relative to the final deployed classifier\. In this work, we treat this threat\-model issue as a methodological question rather than only an implementation detail: in streaming learners, robustness should be assessed on the final streamed model using adaptive attacks crafted against that final state, rather than inferred from stale, transfer\-only, or partially aligned perturbations\.
These questions also arise beyond ARTMAP\. Fuzzy ARTMAP is a particularly informative testbed because it combines strict single\-pass adaptation, explicit prototype competition, interpretable internal geometry, and replay\-free learning\. This makes it well suited for studying a broader methodological problem: how adversarial robustness should be evaluated and improved in streaming prototype\-based learners whose decision boundaries evolve continuously during deployment\.
This paper studies adversarial robustness in streaming Fuzzy ARTMAP from three coupled perspectives: evaluation, training, and interpretability\. More specifically, we ask three questions\. First, how should adaptive white\-box attacks be defined for a non\-differentiable winner\-take\-all learner whose predictions are formed through category competition and map\-field assignment? Second, under strict single\-pass no\-replay constraints, what training and evaluation protocol is required to avoid robustness overestimation caused by stale attacks on intermediate model states? Third, can ART’s explicit category geometry be used not only to interpret post hoc behavior, but also to diagnose robustness failure modes online and motivate targeted replay\-free interventions? This final question is also connected to deployment reliability: if internal scores are reused for rejection, abstention, or escalation, then their semantics must remain valid after adversarial training rather than only in the vanilla regime\[[12](https://arxiv.org/html/2605.06902#bib.bib12),[16](https://arxiv.org/html/2605.06902#bib.bib13),[17](https://arxiv.org/html/2605.06902#bib.bib14),[20](https://arxiv.org/html/2605.06902#bib.bib15)\]\.
To answer these questions, we combine three methodological components\. First, we develop WB\-Softmax, a differentiable softmax relaxation that aggregates category\-level choice values into class\-level scores aligned with the ARTMAP map field, enabling strong adaptive white\-box evaluation\. Second, we distinguish offline and online adversarial\-example generation under streaming updates, and compare standard, selective, and progressive two\-stage training rules within the same replay\-free setting\. Third, we exploit ART’s explicit category geometry through iCVI monitoring and overlap\-based diagnostics to identify structural failure modes; we propose a separation\-aware update rule as the first concrete intervention motivated by these diagnostics, and characterize its operational behavior including a structural limitation of overlap\-only gating\. Taken together, these components define a mechanism\-aligned, protocol\-aware, and interpretability\-driven framework for adversarial robustness in strict streaming learners\.
Our contributions are threefold\. First, we establish a mechanism\-aligned evaluation framework for adversarial robustness in streaming prototype\-based neural learners, instantiated in Fuzzy ARTMAP\. The framework includes WB\-Softmax, a white\-box attack objective aligned with ARTMAP’s category competition and map\-field structure, and a final\-model streaming evaluation principle requiring robustness to be assessed on the deployed streamed model rather than on stale intermediate states\. Empirically, we show that WB\-Softmax PGD provides a strong adaptive evaluator, achieving 89–100% attack success on vanilla Fuzzy ARTMAP models across the evaluated benchmarks and consistently exceeding transfer and query\-based baselines at matched budgets\.
Second, we show that replay\-free robustness is a protocol property, not only an attack\-strength property\. Offline adversarial training can appear effective under transfer evaluation yet collapse under adaptive white\-box evaluation, while progressive two\-stage selective training provides the strongest overall replay\-free robustness across USPS, MNIST, Fashion\-MNIST, and EMNIST\-Letters\.
Third, we show that ART’s internal geometry provides more than post\-hoc interpretability: it enables online diagnosis of structural and semantic reliability failures\. Geometry monitoring reveals*separation collapse*, namely the progressive loss of cross\-class geometric separation caused by adversarial adaptation, while match\-score analysis uncovers match\-score inversion, showing that internal trust signals calibrated on vanilla models may become unreliable after adversarial adaptation\.
The remainder of this paper is organized as follows\. Section 2 reviews the Fuzzy ARTMAP background\. Section 3 formalizes the threat model and evaluation protocol for single\-pass streaming robustness\. Section 4 presents the attack suite, including the proposed WB\-Softmax adaptive white\-box attack and complementary black\-box transfer baselines\. Section 5 introduces interpretable diagnostics and replay\-free training rules for streaming ARTMAP, including geometry\-based monitoring, match\-score analysis, and separation\-aware training\. Section 6 describes the experimental setup\. Section 7 reports the results and discussion\. Section 8 concludes the paper and outlines future directions\. The Appendix provides the constructive proof of Proposition 1 and derived unconditional robustness tables corresponding to the conditional clean\-correct evaluation reported in the main text\.
## 2Background
Fuzzy ART\[[10](https://arxiv.org/html/2605.06902#bib.bib37)\]extends Adaptive Resonance Theory to continuous\-valued inputs\. Given an input feature vector𝒙∈\[0,1\]d\\bm\{x\}\\in\[0,1\]^\{d\}, complement coding forms
𝑰\(𝒙\)=\[𝒙;1−𝒙\]∈\[0,1\]2d,\\bm\{I\}\(\\bm\{x\}\)=\[\\bm\{x\};\\,\\bm\{1\}\-\\bm\{x\}\]\\in\[0,1\]^\{2d\},\(1\)so that the coded input𝑰\(𝒙\)\\bm\{I\}\(\\bm\{x\}\)has constantL1L^\{1\}norm:
\|𝑰\(𝒙\)\|=∑i=1dxi\+∑i=1d\(1−xi\)=d\.\|\\bm\{I\}\(\\bm\{x\}\)\|=\\sum\_\{i=1\}^\{d\}x\_\{i\}\+\\sum\_\{i=1\}^\{d\}\(1\-x\_\{i\}\)=d\.\(2\)Here\|𝒗\|=‖𝒗‖1=∑ivi\|\\bm\{v\}\|=\\\|\\bm\{v\}\\\|\_\{1\}=\\sum\_\{i\}v\_\{i\}for𝒗∈\[0,1\]m\\bm\{v\}\\in\[0,1\]^\{m\}\. This normalization is important in ART because it reduces sensitivity to raw input magnitude and helps mitigate category proliferation caused by norm variation\.
Each categoryjjis represented by a weight vector𝒘j∈\[0,1\]2d\\bm\{w\}\_\{j\}\\in\[0,1\]^\{2d\}, which defines a hyperbox\-like region in complement\-coded space\. For an input𝑰\\bm\{I\}, Fuzzy ART computes the*match function*
Mj\(𝑰\)=\|𝑰∧𝒘j\|\|𝑰\|,M\_\{j\}\(\\bm\{I\}\)=\\frac\{\|\\bm\{I\}\\wedge\\bm\{w\}\_\{j\}\|\}\{\|\\bm\{I\}\|\},\(3\)and the*choice function*
Tj\(𝑰\)=\|𝑰∧𝒘j\|α\+\|𝒘j\|,T\_\{j\}\(\\bm\{I\}\)=\\frac\{\|\\bm\{I\}\\wedge\\bm\{w\}\_\{j\}\|\}\{\\alpha\+\|\\bm\{w\}\_\{j\}\|\},\(4\)where∧\\wedgedenotes element\-wise minimum andα\>0\\alpha\>0is the choice parameter\. Categories compete through winner\-take\-all selection:
J=argmaxjTj\(𝑰\),J=\\arg\\max\_\{j\}T\_\{j\}\(\\bm\{I\}\),\(5\)and the winning categoryJJis accepted if
MJ\(𝑰\)≥ρ,M\_\{J\}\(\\bm\{I\}\)\\geq\\rho,\(6\)where vigilanceρ∈\[0,1\]\\rho\\in\[0,1\]controls category granularity, with largerρ\\rhoproducing finer partitions\. If vigilance fails, mismatch reset inhibits the current winner and the search continues\.
Once a categoryJJis accepted, the general Fuzzy ART learning rule is
𝒘Jnew=β\(𝑰∧𝒘Jold\)\+\(1−β\)𝒘Jold,β∈\(0,1\],\\bm\{w\}\_\{J\}^\{\\text\{new\}\}=\\beta\(\\bm\{I\}\\wedge\\bm\{w\}\_\{J\}^\{\\text\{old\}\}\)\+\(1\-\\beta\)\\bm\{w\}\_\{J\}^\{\\text\{old\}\},\\qquad\\beta\\in\(0,1\],\(7\)whereβ\\betais the learning\-rate parameter\. The fast\-learning case used in this paper corresponds toβ=1\\beta=1, for which \([7](https://arxiv.org/html/2605.06902#S2.E7)\) reduces to
𝒘Jnew=𝑰∧𝒘Jold\.\\bm\{w\}\_\{J\}^\{\\text\{new\}\}=\\bm\{I\}\\wedge\\bm\{w\}\_\{J\}^\{\\text\{old\}\}\.\(8\)
Fuzzy ARTMAP\[[9](https://arxiv.org/html/2605.06902#bib.bib38)\]extends this mechanism to supervised learning by coupling an input moduleARTa\\mathrm\{ART\}\_\{a\}with a label moduleARTb\\mathrm\{ART\}\_\{b\}through a map fieldFabF^\{ab\}that links input categories to class labels\. With map\-field vigilanceρab=1\.0\\rho\_\{ab\}=1\.0, each learned category is mapped to exactly one class label\. When a prediction error occurs,*match tracking*raises theARTa\\mathrm\{ART\}\_\{a\}vigilance parameterρa\\rho\_\{a\}just enough to reject the current winning category and trigger a search for, or creation of, a more specific category associated with the correct label\. This mechanism allows ARTMAP to learn incrementally while preserving stable category\-to\-label assignments\.
These architectural properties make ARTMAP fundamentally different from standard deep classifiers in the context of adversarial robustness\. Prediction is determined by winner\-take\-all competition among explicit categories, while learning proceeds through piecewise category updates in a strictly incremental, single\-pass manner\. Consequently, both the internal score structure and the effective decision boundaries evolve during training\. Standard gradient\-based white\-box attacks are therefore not directly aligned with ARTMAP’s prediction mechanism, and robustness evaluation must account not only for attack strength but also for boundary evolution induced by streaming updates\. This motivates the dedicated threat\-model and evaluation formulation introduced in Section[3](https://arxiv.org/html/2605.06902#S3)\.
A further consequence of ARTMAP’s explicit category structure is that internal scores are interpretable\. In particular, the winning\-category match valueMJ\(𝑰\)M\_\{J\}\(\\bm\{I\}\)can be viewed as a measure of input compatibility with learned categories, which naturally suggests rejection or abstention rules based on a match threshold\. While this intuition is often reasonable for vanilla ARTMAP, adversarial training can reshape post\-training match statistics and even change their ordering\. This motivates the diagnostic and theoretical analysis developed in Section[5](https://arxiv.org/html/2605.06902#S5)\.
## 3Threat Model and Evaluation Protocol
This section formalizes the threat model and evaluation protocol studied throughout the paper\. Unlike conventional adversarial\-robustness settings, which typically assume offline multi\-epoch optimization and evaluate a fixed trained model under adaptive or transfer attacks\[[8](https://arxiv.org/html/2605.06902#bib.bib25),[13](https://arxiv.org/html/2605.06902#bib.bib32)\], we consider a strict single\-pass streaming regime in which the classifier evolves continuously during training and past samples are not revisited\. In this setting, adversarial\-example generation is not merely an implementation detail: it changes the effective attack distribution encountered during training and therefore changes what robustness claims actually mean\.
### 3\.1Streaming Setting and Threat Model
We consider supervised Fuzzy ARTMAP trained in a strict single\-pass incremental setting\. Let
ft:𝒳→𝒴,t=0,1,…,T,f\_\{t\}:\\mathcal\{X\}\\rightarrow\\mathcal\{Y\},\\qquad t=0,1,\\ldots,T,\(9\)denote the model state after processingtttraining samples, wheref0f\_\{0\}is the initial model andfTf\_\{T\}is the final deployed model after the single streamed pass\. Here,𝒳⊆\[0,1\]d\\mathcal\{X\}\\subseteq\[0,1\]^\{d\}denotes the input space and𝒴\\mathcal\{Y\}denotes the discrete label space\. Each sample is processed exactly once, without replay buffers, rehearsal, or repeated optimization over historical data\. This distinguishes our setting from much of the continual\-learning literature, where replay, memory, or episodic correction are standard tools\[[22](https://arxiv.org/html/2605.06902#bib.bib41),[32](https://arxiv.org/html/2605.06902#bib.bib40)\]\. Because Fuzzy ARTMAP updates categories through match tracking, category creation, reset, and fast learning, both its internal category structure and its effective decision boundary can change substantially over time\.
Our evaluation target is therefore the robustness of the final streamed modelfTf\_\{T\}under attacks crafted againstfTf\_\{T\}itself\. This is a methodological choice rather than a convenience: in deployed incremental systems, the relevant adversary interacts with the currently deployed model after adaptation has occurred, not with an intermediate checkpoint or a stale pre\-update snapshot\.
We consider both white\-box and black\-box adversaries, following standard robustness\-evaluation practice\[[8](https://arxiv.org/html/2605.06902#bib.bib25),[13](https://arxiv.org/html/2605.06902#bib.bib32)\], but adapted to the streaming setting\. In the white\-box setting, the adversary has full access to ARTMAP internals, including category weights\{𝒘j\}\\\{\\bm\{w\}\_\{j\}\\\}, the map fieldFabF^\{ab\}, and model hyperparameters; attacks are crafted against the final trained modelfTf\_\{T\}, so that the attack objective is aligned with the deployed decision boundary\. In the black\-box setting, the adversary has no access to ARTMAP internals but does have access to the training set and its ground\-truth labels, allowing labeled surrogate models to be trained for transfer attacks\.
A central issue in streaming learners is whether adversarial examples used during training are generated against a fixed model snapshot or against the current evolving model state\. To formalize this distinction, we let𝒜\(f,𝒙,y\)\\mathcal\{A\}\(f,\\bm\{x\},y\)denote an attack procedure applied to modelffto a labeled sample\(𝒙,y\)\(\\bm\{x\},y\), where𝒙∈𝒳\\bm\{x\}\\in\\mathcal\{X\}is an input andy∈𝒴y\\in\\mathcal\{Y\}is its ground\-truth class label\.
Underoffline attack generation, the adversarial example associated with\(𝒙,y\)\(\\bm\{x\},y\)is crafted against a fixed reference modelfreff\_\{\\mathrm\{ref\}\}\(typically a clean\-trained or pre\-adversarial model\):
𝒙advoff=𝒜\(fref,𝒙,y\),\\bm\{x\}\_\{\\mathrm\{adv\}\}^\{\\mathrm\{off\}\}=\\mathcal\{A\}\(f\_\{\\mathrm\{ref\}\},\\bm\{x\},y\),\(10\)and then used throughout subsequent training:
ft=U\(ft−1;𝒙advoff,y\)\.f\_\{t\}=U\(f\_\{t\-1\};\\,\\bm\{x\}\_\{\\mathrm\{adv\}\}^\{\\mathrm\{off\}\},y\)\.\(11\)Becauseftf\_\{t\}changes withtt, perturbations crafted againstfreff\_\{\\mathrm\{ref\}\}need not remain aligned with the final streamed modelfTf\_\{T\}\. This mismatch is largely hidden in standard batch settings, where training repeatedly revisits the same data distribution and the evaluated model is often the same model family against which the adversarial examples were generated\[[40](https://arxiv.org/html/2605.06902#bib.bib5)\]\.
Underonline attack generation, the adversarial example is regenerated against the current model state at each training step:
𝒙adv,t=𝒜\(ft−1,𝒙t,yt\),\\bm\{x\}\_\{\\mathrm\{adv\},t\}=\\mathcal\{A\}\(f\_\{t\-1\},\\bm\{x\}\_\{t\},y\_\{t\}\),\(12\)ft=U\(ft−1;𝒙adv,t,yt\)\.f\_\{t\}=U\(f\_\{t\-1\};\\,\\bm\{x\}\_\{\\mathrm\{adv\},t\},y\_\{t\}\)\.\(13\)This keeps attack generation aligned with the evolving boundary and avoids the staleness induced by a fixed reference model\.
In this work, robustness is always evaluated using attacks crafted against the final trained modelfTf\_\{T\}, regardless of how adversarial training examples were generated\. This choice prevents robustness from being overstated due to boundary\-shift artifacts or stale perturbations and defines the evaluation object used throughout the remainder of the paper\.
Principle 1 \(Offline Distribution Mismatch in Streaming Learners\)\.In single\-pass incremental learners without replay, adversarial examples crafted against a fixed model snapshot need not remain aligned with the final streamed model because the decision boundary evolves during training\. Consequently, robustness observed under stale, transfer\-based, or otherwise non\-adaptive attacks does not imply robustness under adaptive white\-box attacks on the final deployed model\.
Principle 1 is not a standard theorem inherited from batch adversarial learning; rather, it is the central methodological principle of our streaming setting\. It defines the evaluation mismatch studied throughout the paper and explains why offline adversarial training in streaming learners can appear effective under transfer\-based black\-box evaluation while failing under adaptive white\-box attacks on the final model\.
### 3\.2Attack Budget and Robustness Metrics
We consider untargetedℓ∞\\ell\_\{\\infty\}\-bounded attacks in normalized pixel space\[0,1\]d\[0,1\]^\{d\}with budgets
ϵ∈\{0\.05,0\.10,…,0\.35\},\\epsilon\\in\\\{0\.05,0\.10,\\ldots,0\.35\\\},\(14\)while enforcing𝒙adv∈\[0,1\]d\\bm\{x\}\_\{\\mathrm\{adv\}\}\\in\[0,1\]^\{d\}\.
For gradient\-based attacks, FGSM\[[18](https://arxiv.org/html/2605.06902#bib.bib23)\]applies the one\-step update
𝒙adv=clip\[0,1\]\(𝒙\+ϵsign\(∇𝒙L\)\),\\bm\{x\}\_\{\\mathrm\{adv\}\}=\\mathrm\{clip\}\_\{\[0,1\]\}\\\!\\left\(\\bm\{x\}\+\\epsilon\\,\\mathrm\{sign\}\(\\nabla\_\{\\bm\{x\}\}L\)\\right\),\(15\)whereLLdenotes the attack loss\. PGD\[[25](https://arxiv.org/html/2605.06902#bib.bib24)\]performsKKiterative steps with step sizeη=ϵ/4\\eta=\\epsilon/4, random initialization within theℓ∞\\ell\_\{\\infty\}ball, and projection back to the feasible ball after each step\. These attack definitions are standard; what is specific to our work is the streaming\-aligned white\-box objective introduced later in Section 4, together with the requirement that evaluation target the final streamed model\.
We report four related performance measures\.*Clean accuracy*is the standard classification accuracy on unperturbed test inputs\.*Adversarial accuracy at budgetϵ\\epsilon*, denotedAccadv\(ϵ\)\\mathrm\{Acc\}\_\{\\mathrm\{adv\}\}\(\\epsilon\), is the classification accuracy measured on adversarially perturbed inputs constrained by that budget\. In the main text, when we refer to*robust accuracy*as a function ofϵ\\epsilon, we mean this adversarial\-accuracy curveAccadv\(ϵ\)\\mathrm\{Acc\}\_\{\\mathrm\{adv\}\}\(\\epsilon\)\. We use the term “robust” only in this evaluation sense, not in the sense of a certified guarantee\.
To summarize performance across budgets, we report the Area Under the Robust Accuracy Curve \(AURAC\):
AURAC=1ϵmax−ϵmin∫ϵminϵmaxAccadv\(ϵ\)𝑑ϵ,\\mathrm\{AURAC\}=\\frac\{1\}\{\\epsilon\_\{\\max\}\-\\epsilon\_\{\\min\}\}\\int\_\{\\epsilon\_\{\\min\}\}^\{\\epsilon\_\{\\max\}\}\\mathrm\{Acc\}\_\{\\mathrm\{adv\}\}\(\\epsilon\)\\,d\\epsilon,\(16\)approximated by trapezoidal integration on the grid in \([14](https://arxiv.org/html/2605.06902#S3.E14)\)\. AURAC is not a new metric; we use it because it summarizes adversarial performance across a range of perturbation strengths rather than at a single operating point\. This is especially useful in streaming settings, where different training protocols may behave differently at low and highϵ\\epsilon\.
### 3\.3Sanity Checks and Validity Criteria
Following Athalye et al\.\[[2](https://arxiv.org/html/2605.06902#bib.bib29)\], we apply four validity checks adapted to streaming ART models:
1. 1\.single\-step FGSM achieves non\-trivial attack success;
2. 2\.iterative PGD is at least as strong as FGSM;
3. 3\.white\-box attacks match or exceed black\-box transfer attacks; and
4. 4\.accuracy degrades smoothly asϵ\\epsilonincreases\.
These checks are not novel in themselves, but they are essential in our setting to verify that the proposed attack\-and\-evaluation pipeline is measuring genuine vulnerability rather than artifacts of non\-smooth winner\-take\-all computation\.
This section defines the robustness object studied in the remainder of the paper: not robustness to arbitrary stale perturbations, but robustness of the final streamed ARTMAP model under attacks aligned with both its prediction mechanism and its final post\-training state\.
## 4White\-Box Attacks on Fuzzy ARTMAP
This section instantiates the generic attack lossℒ\\mathcal\{L\}introduced in Section[3](https://arxiv.org/html/2605.06902#S3)for adaptive white\-box evaluation of Fuzzy ARTMAP\. The core difficulty is architectural: ARTMAP predicts through winner\-take\-all category competition followed by map\-field assignment, while its computation contains non\-differentiable selection and piecewise operations\. As a result, standard gradient\-based white\-box attacks are not naturally aligned with the mechanism by which ARTMAP actually makes decisions and may therefore produce weak or misleading optimization signals\.
Accordingly, the main contribution of this section is not a new outer optimization routine, but an ARTMAP\-aligned attack objective\. We propose WB\-Softmax, a differentiable surrogate that converts category\-level competition into class\-level attack scores in a manner consistent with the ARTMAP map field\. Rather than replacing the forward model with a generic smooth proxy, WB\-Softmax preserves the original ARTMAP forward computation and introduces differentiability only in the loss used for optimization\.
#### WB\-Softmax: softmax\-relaxed class loss
ARTMAP selects the winning category by hard maximization of the choice values and then predicts through the map field\. Because this hard winner\-take\-all step is non\-differentiable almost everywhere, it blocks direct gradient\-based optimization\. Our key idea is therefore to replace hard winner selection in the attack objective with a differentiable softmax relaxation over category choice values, and then aggregate the resulting probability mass at the class level according to the map field\.
LetCcC\_\{c\}denote the set of categories mapped to classcc\. We define the class\-level probability of classccby summing the softmax mass of all categories assigned to that class:
pc=∑j∈Ccexp\(Tj/τ\)∑kexp\(Tk/τ\),p\_\{c\}=\\sum\_\{j\\in C\_\{c\}\}\\frac\{\\exp\(T\_\{j\}/\\tau\)\}\{\\sum\_\{k\}\\exp\(T\_\{k\}/\\tau\)\},\(17\)whereτ\>0\\tau\>0is the softmax temperature\. The WB\-Softmax attack loss is then the negative log\-likelihood of the true class:
LWB\-Softmax=−log\(py\)\.L\_\{\\mathrm\{WB\\text\{\-\}Softmax\}\}=\-\\log\(p\_\{y\}\)\.\(18\)
This construction is specific to ARTMAP\. The attack is driven by category\-level choice values and aggregated according to the class assignments encoded by the map field, so the objective remains aligned with ARTMAP’s actual competition\-and\-mapping structure rather than with an external differentiable proxy\. We use probability\-mass aggregation \(sum over categories\) rather than a hard classwise max so that gradients remain informative even when multiple same\-class categories compete near the decision boundary\.
We setτ=0\.01\\tau=0\.01based on attack\-strength ablations\. Lower temperatures concentrate probability mass on near\-winning categories while still preserving useful gradient flow\. Maximizing \([17](https://arxiv.org/html/2605.06902#S4.E17)\) therefore pushes probability mass away from the true class and promotes misclassification\.
#### Smooth surrogate construction under complement coding
WB\-Softmax introduces differentiability only in the aggregation step used to construct the attack objective; the forward ARTMAP computation itself remains unchanged\. This differs from broader BPDA\-style gradient substitutions that modify the forward/backward interface more aggressively\[[2](https://arxiv.org/html/2605.06902#bib.bib29)\]\. Our goal is not to replace ARTMAP with a generic smooth model, but to build a white\-box objective that respects its native category structure\.
Under complement coding,
𝑰\(𝒙\)=\[𝒙;1−𝒙\],\\bm\{I\}\(\\bm\{x\}\)=\[\\bm\{x\};\\,\\bm\{1\}\-\\bm\{x\}\],\(19\)let𝑰1:d\\bm\{I\}\_\{1:d\}denote the firstddcomponents of𝑰\(𝒙\)\\bm\{I\}\(\\bm\{x\}\)and let𝑰d\+1:2d\\bm\{I\}\_\{d\+1:2d\}denote the lastddcomponents\. By construction,
𝑰1:d=𝒙,𝑰d\+1:2d=𝟏−𝒙\.\\bm\{I\}\_\{1:d\}=\\bm\{x\},\\qquad\\bm\{I\}\_\{d\+1:2d\}=\\bm\{1\}\-\\bm\{x\}\.\(20\)The chain rule therefore gives
∇𝒙L=∇𝑰1:dL−∇𝑰d\+1:2dL\.\\nabla\_\{\\bm\{x\}\}L=\\nabla\_\{\\bm\{I\}\_\{1:d\}\}L\-\\nabla\_\{\\bm\{I\}\_\{d\+1:2d\}\}L\.\(21\)Equation \([21](https://arxiv.org/html/2605.06902#S4.E21)\) ensures that gradients are propagated correctly through the coded representation: perturbations in𝒙\\bm\{x\}induce equal\-magnitude, opposite\-signed changes in the two halves of𝑰\(𝒙\)\\bm\{I\}\(\\bm\{x\}\)\. Thus, WB\-Softmax preserves the original complement\-coded forward representation while still enabling gradient\-based white\-box optimization\.
#### WB\-Softmax PGD attack
We instantiate WB\-Softmax inside a standard projected gradient descent \(PGD\) outer loop with random start inside theℓ∞\\ell\_\{\\infty\}ball\. The novelty is therefore the ARTMAP\-aligned objective, not the PGD shell itself\. Algorithm[1](https://arxiv.org/html/2605.06902#alg1)summarizes the complete attack procedure used throughout this paper\. We use𝒰\(−ϵ,ϵ\)\\mathcal\{U\}\(\-\\epsilon,\\epsilon\)to denote element\-wise uniform noise in\[−ϵ,ϵ\]\[\-\\epsilon,\\epsilon\], andΠ\[𝒙−ϵ,𝒙\+ϵ\]\(⋅\)\\Pi\_\{\[\\bm\{x\}\-\\epsilon,\\bm\{x\}\+\\epsilon\]\}\(\\cdot\)to denote projection onto theℓ∞\\ell\_\{\\infty\}ball around𝒙\\bm\{x\}via coordinate\-wise clipping\.
Algorithm 1WB\-Softmax PGD Attack0:input
𝒙∈\[0,1\]d\\bm\{x\}\\in\[0,1\]^\{d\}, ground\-truth label
y∈𝒴y\\in\\mathcal\{Y\}, perturbation budget
ϵ\\epsilon, number of steps
KK, step size
η=ϵ/4\\eta=\\epsilon/4, temperature
τ\\tau
1:
𝒙adv←clip\[0,1\]\(𝒙\+𝒰\(−ϵ,ϵ\)\)\\bm\{x\}\_\{\\mathrm\{adv\}\}\\leftarrow\\mathrm\{clip\}\_\{\[0,1\]\}\\\!\\big\(\\bm\{x\}\+\\mathcal\{U\}\(\-\\epsilon,\\epsilon\)\\big\)
2:for
k=1k=1to
KKdo
3:
𝑰←\[𝒙adv;1−𝒙adv\]\\bm\{I\}\\leftarrow\[\\bm\{x\}\_\{\\mathrm\{adv\}\};\\,\\bm\{1\}\-\\bm\{x\}\_\{\\mathrm\{adv\}\}\]
4:
Tj←\|𝑰∧𝒘j\|α\+\|𝒘j\|∀jT\_\{j\}\\leftarrow\\dfrac\{\\lvert\\bm\{I\}\\wedge\\bm\{w\}\_\{j\}\\rvert\}\{\\alpha\+\\lvert\\bm\{w\}\_\{j\}\\rvert\}\\qquad\\forall j
5:
qj←exp\(Tj/τ\)∑mexp\(Tm/τ\)∀jq\_\{j\}\\leftarrow\\dfrac\{\\exp\(T\_\{j\}/\\tau\)\}\{\\sum\_\{m\}\\exp\(T\_\{m\}/\\tau\)\}\\qquad\\forall j
6:
pc←∑j∈𝒞cqj∀cp\_\{c\}\\leftarrow\\sum\_\{j\\in\\mathcal\{C\}\_\{c\}\}q\_\{j\}\\qquad\\forall c⊳\\trianglerightclass\-level probabilities
7:
ℒ←−log\(py\)\\mathcal\{L\}\\leftarrow\-\\log\(p\_\{y\}\)
8:
𝒈←∇𝒙advℒ\\bm\{g\}\\leftarrow\\nabla\_\{\\bm\{x\}\_\{\\mathrm\{adv\}\}\}\\mathcal\{L\}⊳\\trianglerightautograd \(equiv\. to Eq\. \([21](https://arxiv.org/html/2605.06902#S4.E21)\)\)
9:
𝒙adv←Π\[𝒙−ϵ,𝒙\+ϵ\]\(𝒙adv\+ηsign\(𝒈\)\)\\bm\{x\}\_\{\\mathrm\{adv\}\}\\leftarrow\\Pi\_\{\[\\bm\{x\}\-\\epsilon,\\bm\{x\}\+\\epsilon\]\}\\\!\\big\(\\bm\{x\}\_\{\\mathrm\{adv\}\}\+\\eta\\,\\mathrm\{sign\}\(\\bm\{g\}\)\\big\)
10:
𝒙adv←clip\[0,1\]\(𝒙adv\)\\bm\{x\}\_\{\\mathrm\{adv\}\}\\leftarrow\\mathrm\{clip\}\_\{\[0,1\]\}\(\\bm\{x\}\_\{\\mathrm\{adv\}\}\)
11:endfor
12:return
𝒙adv\\bm\{x\}\_\{\\mathrm\{adv\}\}
WB\-Softmax operationalizes the evaluation principle of Section[3](https://arxiv.org/html/2605.06902#S3): if robustness is to be assessed against adaptive white\-box attacks on the final streamed model, then the attack objective itself must be aligned with the mechanism by which that model actually predicts\. More broadly, this illustrates a general lesson for non\-smooth winner\-take\-all learners: meaningful white\-box evaluation requires a surrogate objective aligned with the model’s native decision rule, rather than with an unrelated differentiable proxy\.
#### Black\-box transfer attacks
For black\-box evaluation, we train surrogate classifiers on the same training set and generate transfer attacks with PGD\. Our primary surrogate is a SimpleCNN, and we additionally consider an LRS\-regularized surrogate to strengthen transferability\. These surrogates are used only to generate black\-box transfer attacks; they do not alter the ARTMAP training or inference mechanism\. Full surrogate architectures and training hyperparameters are deferred to Section 6\.
## 5Interpretable Diagnostics and Reliability\-Aware Training Rules
Throughout this section, we use*absorption*to mean that an incoming sample is assigned to an existing accepted category and updates that category under the fast\-learning rule, rather than creating a new category\.
ART exposes its internal state through explicit category geometry, enabling a diagnosis\-to\-rule workflow that is difficult to realize in black\-box models\. In our setting, this structure is not merely descriptive: it can be monitored online to detect structural failure modes that emerge during adversarial training, and it can be used to design lightweight replay\-free interventions that act directly on category creation and fast\-learning updates of existing categories\. This makes ARTMAP unusual among streaming learners: the same structure that supports incremental learning also supports online interpretability and targeted robustness interventions without replay\.
This capability is also relevant to deployment reliability\. In streaming systems, internal quantities such as match may be reused for rejection, abstention, or reliability filtering\[[12](https://arxiv.org/html/2605.06902#bib.bib12),[16](https://arxiv.org/html/2605.06902#bib.bib13),[17](https://arxiv.org/html/2605.06902#bib.bib14),[20](https://arxiv.org/html/2605.06902#bib.bib15)\]\. If adversarial training changes the relationship between such quantities and correctness, then trust signals calibrated on vanilla models may become unsafe after adaptation\. Accordingly, this section studies two coupled questions: \(i\) how explicit category geometry can diagnose robustness failure modes online, and \(ii\) whether post\-training internal scores, especially match\-based scores, remain reliable proxies for correctness\. We show that both questions admit actionable answers in ARTMAP: geometry monitoring exposes*separation collapse*, a failure mode in which different\-class categories become increasingly overlapping during adversarial adaptation, while match\-score analysis reveals a reliability failure that can invalidate vanilla\-calibrated rejection rules\.
### 5\.1Online Structural Diagnostics via iCVIs and Geometry Indicators
Cluster validity indices \(CVIs\) play a central role in assessing clustering quality, and incremental cluster validity indices \(iCVIs\) provide online counterparts that can be updated recursively as samples arrive, including online iCVI formulations for streaming clustering\[[30](https://arxiv.org/html/2605.06902#bib.bib60),[5](https://arxiv.org/html/2605.06902#bib.bib49),[6](https://arxiv.org/html/2605.06902#bib.bib48)\]\. In black\-box deep models, related geometric analyses typically require post\-hoc embeddings or external probes\. In contrast, ART categories have explicit geometric representations, so class\-conditional separation, overlap, and compactness can be computed directly from the current category set during training\.
In this work, we use online structural diagnostics rather than post\-hoc visualization\. We monitor two geometry indicators that are especially informative during adversarial training: \(i\)*minimum separation*, the smallest non\-overlap margin between different\-class categories, and \(ii\)*overlap risk*, the maximum normalized intersection between wrong\-class category pairs\. Because these indicators depend only on the current category weights\{𝒘j\}\\\{\\bm\{w\}\_\{j\}\\\}, they can be updated online without storing past samples\. These indicators are particularly useful for revealing failure modes that are not apparent from accuracy curves or category counts alone, and they provide actionable signals for designing targeted replay\-free training rules\. Section[7](https://arxiv.org/html/2605.06902#S7)presents a case study illustrating these diagnostics on one of the benchmarks introduced in Section[6](https://arxiv.org/html/2605.06902#S6)\.
Geometry indicators\.For the purpose of online overlap diagnostics, we use a diagnostic box representation in complement\-coded space induced by the category weights\. Let categoryjjbe represented by lower and upper boundsℓj=𝒘j\\bm\{\\ell\}\_\{j\}=\\bm\{w\}\_\{j\}and𝒖j=𝟏\\bm\{u\}\_\{j\}=\\bm\{1\}, and let𝒞\(j\)\\mathcal\{C\}\(j\)denote its mapped class\. Hereℓj,𝒖j∈\[0,1\]2d\\bm\{\\ell\}\_\{j\},\\bm\{u\}\_\{j\}\\in\[0,1\]^\{2d\}\. For two boxesjjandkk, define theL1L^\{1\}intersection length
Int\(j,k\)=\|max\(𝟎,min\(𝒖j,𝒖k\)−max\(ℓj,ℓk\)\)\|1,\\mathrm\{Int\}\(j,k\)=\\big\\lvert\\max\(\\bm\{0\},\\,\\min\(\\bm\{u\}\_\{j\},\\bm\{u\}\_\{k\}\)\-\\max\(\\bm\{\\ell\}\_\{j\},\\bm\{\\ell\}\_\{k\}\)\)\\big\\rvert\_\{1\},\(22\)where themin\(⋅,⋅\)\\min\(\\cdot,\\cdot\)andmax\(⋅,⋅\)\\max\(\\cdot,\\cdot\)operators inside \([22](https://arxiv.org/html/2605.06902#S5.E22)\) act element\-wise before the finalL1L^\{1\}aggregation\. We then define the normalized overlap as
Ov\(j,k\)=Int\(j,k\)min\(\|𝒖j−ℓj\|1,\|𝒖k−ℓk\|1\)\+ε0,\\mathrm\{Ov\}\(j,k\)=\\frac\{\\mathrm\{Int\}\(j,k\)\}\{\\min\(\\lvert\\bm\{u\}\_\{j\}\-\\bm\{\\ell\}\_\{j\}\\rvert\_\{1\},\\;\\lvert\\bm\{u\}\_\{k\}\-\\bm\{\\ell\}\_\{k\}\\rvert\_\{1\}\)\+\\varepsilon\_\{0\}\},\(23\)whereε0\\varepsilon\_\{0\}avoids division by zero\. We further define*overlap risk*as
OR=max𝒞\(j\)≠𝒞\(k\)Ov\(j,k\),\\mathrm\{OR\}=\\max\_\{\\mathcal\{C\}\(j\)\\neq\\mathcal\{C\}\(k\)\}\\mathrm\{Ov\}\(j,k\),\(24\)and*minimum separation*as the smallest non\-overlap margin between different\-class boxes,
MS=min𝒞\(j\)≠𝒞\(k\)\[1−Ov\(j,k\)\]\.\\mathrm\{MS\}=\\min\_\{\\mathcal\{C\}\(j\)\\neq\\mathcal\{C\}\(k\)\}\\Big\[\\,1\-\\mathrm\{Ov\}\(j,k\)\\,\\Big\]\.\(25\)These quantities depend only on the current category set\{𝒘j\}\\\{\\bm\{w\}\_\{j\}\\\}and can therefore be maintained online throughout training\.
### 5\.2Streaming Training Protocols, Failure Modes, and Targeted Interventions
We revisit the training\-protocol taxonomy of Section[3](https://arxiv.org/html/2605.06902#S3)from a geometric and reliability perspective\. We compare replay\-free adversarial training along two axes:*when*adversarial examples are generated \(offline versus online\) and*which*generated examples are used for updating \(standard versus selective\)\. Offline variants use perturbations crafted against a fixed clean\-trained reference model, whereas online variants regenerate perturbations against the current streamed model state\. Standard variants update on all generated adversarial examples, whereas selective variants update only on adversarial examples that induce misclassification, that is, those satisfyingf\(𝒙adv\)≠yf\(\\bm\{x\}\_\{\\mathrm\{adv\}\}\)\\neq y\. Within standard updates, the absorb\-vs\-create decision can additionally be gated by predicted hyperbox overlap \(Algorithm[3](https://arxiv.org/html/2605.06902#alg3)\), without changing whether a sample triggers an update\. Algorithm[2](https://arxiv.org/html/2605.06902#alg2)summarizes these protocol choices\.
These protocol differences are consequential because adversarial updates in ARTMAP can alter category structure rather than merely adjust a smooth boundary\. Depending on when perturbations are generated and which samples are admitted for updating, adversarial training may trigger mismatch reset, new\-category formation, or cross\-class encroachment\. Consequently, robustness in ARTMAP depends not only on attack strength, but also on how adversarial samples are scheduled and filtered during streaming adaptation\.
A practical consequence is category proliferation\. Adversarial training can substantially increase the number of categories, often by roughly a factor of two, thereby increasing memory footprint and evaluation cost\. When an adversarial example𝒙adv\\bm\{x\}\_\{\\mathrm\{adv\}\}falls outside existing hyperboxes, learning may create a new category instead of absorbing the sample into an existing one\. Selective updating mitigates this unnecessary growth by concentrating updates on adversarial examples that reveal genuine model failures\.
To improve stability across perturbation strengths, we employ progressive two\-stage selective training\. Stage 1 samplesϵ∈\[0\.05,0\.15\]\\epsilon\\in\[0\.05,0\.15\], and Stage 2 extends toϵ∈\[0\.15,0\.35\]\\epsilon\\in\[0\.15,0\.35\], using online selective filtering in both stages\. This schedule separates adaptation to moderate perturbations from later adaptation to stronger attacks, reducing the instability that can arise when large\-ϵ\\epsilonperturbations are introduced too early and providing a more controlled path for category evolution\.
Algorithm 2Adversarial Training Protocols0:training stream
\{\(𝒙i,yi\)\}\\\{\(\\bm\{x\}\_\{i\},y\_\{i\}\)\\\}, attack mode
∈\{offline,online\}\\in\\\{\\textsc\{offline\},\\textsc\{online\}\\\}, update rule
∈\{standard,selective\}\\in\\\{\\textsc\{standard\},\\textsc\{selective\}\\\}
1:ifattack mode =offlinethen
2:Train on all clean samples
\{\(𝒙i,yi\)\}\\\{\(\\bm\{x\}\_\{i\},y\_\{i\}\)\\\}⊳\\trianglerightPass 1
3:foreach
\(𝒙i,yi\)\(\\bm\{x\}\_\{i\},y\_\{i\}\)do
4:Generate
𝒙adv,i\\bm\{x\}\_\{\\mathrm\{adv\},i\}against the fixed clean\-trained model
5:ifupdate rule =standardor
f\(𝒙adv,i\)≠yif\(\\bm\{x\}\_\{\\mathrm\{adv\},i\}\)\\neq y\_\{i\}then
6:Train on
\(𝒙adv,i,yi\)\(\\bm\{x\}\_\{\\mathrm\{adv\},i\},y\_\{i\}\)
7:endif
8:endfor
9:else
10:foreach
\(𝒙i,yi\)\(\\bm\{x\}\_\{i\},y\_\{i\}\)do
11:Train on
\(𝒙i,yi\)\(\\bm\{x\}\_\{i\},y\_\{i\}\)
12:Generate
𝒙adv,i\\bm\{x\}\_\{\\mathrm\{adv\},i\}against the current model
13:ifupdate rule =standardor
f\(𝒙adv,i\)≠yif\(\\bm\{x\}\_\{\\mathrm\{adv\},i\}\)\\neq y\_\{i\}then
14:Train on
\(𝒙adv,i,yi\)\(\\bm\{x\}\_\{\\mathrm\{adv\},i\},y\_\{i\}\)
15:endif
16:endfor
17:endif
These observations suggest the following failure\-mode taxonomy for streaming ARTMAP under adversarial training\.
Failure\-mode taxonomy \(streaming ARTMAP under adversarial training\)\.We observe two recurring pathology classes that are detectable online from the current category set:
- •Separation collapse:*Detection*—minimum separation drops sharply while compactness remains high \(Table[3](https://arxiv.org/html/2605.06902#S7.T3)\)\.*Risk*—high\-ϵ\\epsilonrobustness degrades because cross\-class overlap increases during adversarial updates to existing categories\.*Mitigation*—an overlap\-based separation\-aware update rule \(Algorithm 3\) that checks predicted overlap before fast\-learning absorption \(Section[7](https://arxiv.org/html/2605.06902#S7)reports its empirical behavior\)\.
- •Match\-score inversion:*Detection*—after selective adversarial training, adversarial samples can attain higher match than clean samples in some regimes\.*Risk*—match\-threshold rejection calibrated on vanilla models may fail or invert\.*Mitigation*—calibrate rejection jointly with the chosen training protocol; do not transfer thresholds across variants\.
The second failure mode is semantic rather than purely geometric: even when match remains locally well\-defined, its interpretation as a reliability signal can break after selective adversarial training\.
###### Lemma 1\(Match Preservation under Selective Absorption\)\.
Consider selective adversarial training for streaming Fuzzy ARTMAP, and let an adversarial sample𝐱adv\\bm\{x\}\_\{\\mathrm\{adv\}\}be absorbed into an existing correct\-class categoryjjunder fast learning\. Then the category\-level match of that same sample to categoryjjis preserved after the update\. Consequently, selective absorption does not reduce the absorbed\-category match of updated adversarial samples, even though correctness is determined by global category competition and therefore need not remain monotonic in match\.
Lemma[1](https://arxiv.org/html/2605.06902#Thmlemma1)isolates the local effect of selective absorption: for an adversarial sample that is actually used for updating, fast learning preserves its match to the absorbed category rather than decreasing it\. This local preservation does not by itself imply correct prediction, because final decisions depend on winner\-take\-all competition across all categories and classes\. More broadly, selective updates, category creation, and competition shifts can reshape the post\-training match statistics of adversarial inputs, which motivates the stronger non\-monotonicity statement in Proposition[1](https://arxiv.org/html/2605.06902#Thmproposition1)\.
###### Proposition 1\(Non\-monotonicity of Match as a Correctness Proxy after Selective Training\)\.
Consider a streaming Fuzzy ARTMAP trained without replay using selective adversarial training, where only misclassified adversarial samples\(𝐱adv,y\)\(\\bm\{x\}\_\{\\mathrm\{adv\}\},y\)satisfyingf\(𝐱adv\)≠yf\(\\bm\{x\}\_\{\\mathrm\{adv\}\}\)\\neq ytrigger updates\. LetMj\(𝐈\(𝐱\)\)M\_\{j\}\(\\bm\{I\}\(\\bm\{x\}\)\)denote the post\-training category\-level match score of input𝐱\\bm\{x\}with respect to categoryjj\. Then, after selective adversarial training, match need not be a monotone indicator of correctness; in particular, there exist labeled inputs\(𝐱1,y1\)\(\\bm\{x\}\_\{1\},y\_\{1\}\)and\(𝐱2,y2\)\(\\bm\{x\}\_\{2\},y\_\{2\}\)and corresponding categoriesj1j\_\{1\}andj2j\_\{2\}such that
Mj1\(𝑰\(𝒙1\)\)\\displaystyle M\_\{j\_\{1\}\}\(\\bm\{I\}\(\\bm\{x\}\_\{1\}\)\)\>Mj2\(𝑰\(𝒙2\)\),\\displaystyle\>M\_\{j\_\{2\}\}\(\\bm\{I\}\(\\bm\{x\}\_\{2\}\)\),\(26\)f\(𝒙1\)\\displaystyle f\(\\bm\{x\}\_\{1\}\)≠y1,\\displaystyle\\neq y\_\{1\},f\(𝒙2\)\\displaystyle f\(\\bm\{x\}\_\{2\}\)=y2\.\\displaystyle=y\_\{2\}\.Thus, a larger post\-training match score does not necessarily imply a higher likelihood of correct classification\. Consequently, match\-threshold rejection rules calibrated on vanilla models may fail after selective adversarial training\.
A constructive proof is provided in Appendix A\.
Proposition[1](https://arxiv.org/html/2605.06902#Thmproposition1)explains why a fixed match\-threshold detector that performs well on vanilla models can fail after selective adversarial training: the training process can reshape adversarial match statistics without preserving a monotone relationship between match and correctness\. Consequently, thresholds calibrated on vanilla models should not be transferred across training protocols without validation, since AUC can drop sharply and may even fall below chance\. Importantly, the proposition does not assert that selective absorption strictly increases match in the absorbed\-category case\. Instead, it makes the more conservative claim that, after selective adversarial training, post\-training match need not remain a monotone proxy for correctness\. This result matters beyond adversarial evaluation: in online deployment, any rejection, abstention, or human\-escalation mechanism that relies on match as a trust signal must be revalidated after adversarial training\. In other words, the issue is not only robustness of prediction, but also robustness of the internal confidence proxy used for downstream decision support\.
*Separation\-aware training rule\.*Motivated by separation collapse, we propose separation\-aware training, which modifies the*form*of the adversarial update \(absorption vs\. new\-category creation\), not whether one occurs\. For each adversarial example𝒙adv\\bm\{x\}\_\{\\mathrm\{adv\}\}, we \(i\) identify the best\-matching correct\-class categoryjj, \(ii\) simulate the post\-update hyperbox𝒘jnew=𝒘j∧𝑰\(𝒙adv\)\\bm\{w\}\_\{j\}^\{\\mathrm\{new\}\}=\\bm\{w\}\_\{j\}\\wedge\\bm\{I\}\(\\bm\{x\}\_\{\\mathrm\{adv\}\}\), and \(iii\) evaluate its overlap with wrong\-class categories\. If the predicted overlap exceeds a thresholdθ\\theta, we create a new category instead of absorbing𝒙adv\\bm\{x\}\_\{\\mathrm\{adv\}\}into the existing one, that is, instead of updating the accepted category under fast learning\. Overlap is computed as the normalizedL1L^\{1\}intersection of hyperbox bounds, directly penalizing expansions that intrude into wrong\-class regions\. This rule preserves class separation while still incorporating informative adversarial samples\.
Algorithm 3Separation\-Aware Update Rule for Adversarial Samples0:adversarial sample
𝒙adv\\bm\{x\}\_\{\\mathrm\{adv\}\}, ground\-truth label
yy, current categories
\{𝒘j\}\\\{\\bm\{w\}\_\{j\}\\\}, class mapping
𝒞\(j\)\\mathcal\{C\}\(j\), threshold
θ\\theta
1:
𝑰←\[𝒙adv;1−𝒙adv\]\\bm\{I\}\\leftarrow\[\\bm\{x\}\_\{\\mathrm\{adv\}\};\\,\\bm\{1\}\-\\bm\{x\}\_\{\\mathrm\{adv\}\}\]
2:
j⋆←argmaxj:𝒞\(j\)=yTj\(𝑰\)j^\{\\star\}\\leftarrow\\arg\\max\_\{j:\\mathcal\{C\}\(j\)=y\}T\_\{j\}\(\\bm\{I\}\)⊳\\trianglerighthighest\-choice correct\-class category
3:
𝒘new←𝒘j⋆∧𝑰\\bm\{w\}^\{\\mathrm\{new\}\}\\leftarrow\\bm\{w\}\_\{j^\{\\star\}\}\\wedge\\bm\{I\}⊳\\trianglerightsimulated fast\-learning update
4:
Δ←maxk:𝒞\(k\)≠yOv\(\(ℓnew,𝒖new\),\(ℓk,𝒖k\)\)\\Delta\\leftarrow\\max\_\{k:\\mathcal\{C\}\(k\)\\neq y\}\\;\\mathrm\{Ov\}\\big\(\(\\bm\{\\ell\}^\{\\mathrm\{new\}\},\\bm\{u\}^\{\\mathrm\{new\}\}\),\(\\bm\{\\ell\}\_\{k\},\\bm\{u\}\_\{k\}\)\\big\)
5:where
ℓnew=𝒘new,𝒖new=𝟏,ℓk=𝒘k,𝒖k=𝟏\\bm\{\\ell\}^\{\\mathrm\{new\}\}=\\bm\{w\}^\{\\mathrm\{new\}\},\\ \\bm\{u\}^\{\\mathrm\{new\}\}=\\bm\{1\},\\ \\bm\{\\ell\}\_\{k\}=\\bm\{w\}\_\{k\},\\ \\bm\{u\}\_\{k\}=\\bm\{1\}
6:if
Δ\>θ\\Delta\>\\thetathen
7:Create a new category for class
yywith
𝒘←𝑰\\bm\{w\}\\leftarrow\\bm\{I\}
8:else
9:Update the existing category:
𝒘j⋆←𝒘new\\bm\{w\}\_\{j^\{\\star\}\}\\leftarrow\\bm\{w\}^\{\\mathrm\{new\}\}
10:endif
Match\-score statistics highlight a caveat for rejection\-based defenses\. On vanilla models, adversarial examples tend to produce lower match scores than clean samples, suggesting that a simple threshold on match could reject many adversarial inputs\. However, after selective adversarial training this relationship can reverse: adversarial examples can attain higher match scores than clean samples, and the corresponding match\-threshold detector can degrade below chance\. Section[7](https://arxiv.org/html/2605.06902#S7)reports the empirical AUC collapse on a representative dataset\. Practically, rejection thresholds must be calibrated for the specific trained variant and validated jointly with the chosen training protocol, rather than transferred from a vanilla model\[[12](https://arxiv.org/html/2605.06902#bib.bib12),[16](https://arxiv.org/html/2605.06902#bib.bib13),[17](https://arxiv.org/html/2605.06902#bib.bib14),[20](https://arxiv.org/html/2605.06902#bib.bib15)\]\. We compute this AUC by using the match score as a scalar detector score and sweeping a single threshold\. This reversal is consistent with selective training preserving absorbed\-category match while reshaping post\-training attained\-match statistics through selective updates, category creation, and winner competition\. More broadly, it shows that interpretable internal scores in streaming models should not be assumed to remain trustworthy after adversarial training merely because they were reliable in the vanilla regime\.
## 6Experimental Setup
We evaluate on four image\-classification benchmarks: USPS \(7,291 train / 2,007 test,16×1616\\times 16\), MNIST \(60,000 / 10,000,28×2828\\times 28\), Fashion\-MNIST \(60,000 / 10,000,28×2828\\times 28\), and EMNIST\-Letters \(124,800 / 20,800, 26 classes,28×2828\\times 28\)\. These benchmarks are not intended to define a large\-scale vision leaderboard; rather, they serve as controlled streaming testbeds in which ARTMAP category growth, final\-model attack alignment, and replay\-free adversarial\-training protocols can be compared systematically across different input dimensions and class structures\. All images are normalized to\[0,1\]\[0,1\]and flattened before training\. Unless otherwise noted, we use Fuzzy ARTMAP withα=10−3\\alpha=10^\{\-3\},β=1\.0\\beta=1\.0, andρab=1\.0\\rho\_\{ab\}=1\.0\.
For vanilla models, we sweep the input\-module vigilance parameterρa∈\{0\.0,0\.1,…,0\.9\}\\rho\_\{a\}\\in\\\{0\.0,0\.1,\\ldots,0\.9\\\}to characterize the clean\-accuracy/robustness tradeoff\. Figure 1 reports representative vulnerability results forρa=0\.5\\rho\_\{a\}=0\.5–0\.90\.9, since lower\-vigilance settings yield poor clean accuracy and are not competitive in practice\. All defense methods are evaluated atρa=0\.9\\rho\_\{a\}=0\.9, which provides the strongest overall robustness in our setting\.
For white\-box attacks, we use WB\-Softmax with temperatureτ=0\.01\\tau=0\.01and perturbation budgetsϵ∈\{0\.05,0\.10,…,0\.35\}\\epsilon\\in\\\{0\.05,0\.10,\\ldots,0\.35\\\}\. We verified the WB\-Softmax hyperparameters through attack\-strength ablations:τ\\tauwas swept over the range\[0\.005,0\.1\]\[0\.005,0\.1\]with evaluation at\{0\.005,0\.01,0\.02,0\.05,0\.10\}\\\{0\.005,0\.01,0\.02,0\.05,0\.10\\\}, and PGD steps were varied over\{1,5,10,20\}\\\{1,5,10,20\\\}\. Across datasets,τ=0\.01\\tau=0\.01and PGD\-20 consistently produced the strongest and most stable attacks, and we use this configuration throughout the main evaluation\.
For black\-box transfer robustness, we use two surrogate families\. The first is a SimpleCNN trained separately on each dataset\. It contains two convolutional blocks, each composed of convolution, nonlinear activation, and spatial downsampling, followed by a fully connected classifier that outputs class logits\. The second is an LRS\-regularized surrogate\[[42](https://arxiv.org/html/2605.06902#bib.bib27)\], trained for 10 epochs using Adam \(lr=10−3\\mathrm\{lr\}=10^\{\-3\}\) with gradient\-norm regularizationλ=2500\\lambda=2500\. We selectedλ=2500\\lambda=2500from a brief empirical sweepλ∈\{500,1000,2000,2500\}\\lambda\\in\\\{500,1000,2000,2500\\\}and observed that transfer\-attack strength, measured by AURAC, is relatively insensitive within this range\.
We compare seven training variants \(Table[1](https://arxiv.org/html/2605.06902#S7.T1)\)\. The first three are non\-selective \(every adversarial triggers an update\); the latter four are selective \(only misclassifying adversarials trigger updates\) or progressive selective:
- •Vanilla: no\-defense baseline\.
- •AdvTrain \(off\): offline adversarial training that updates on all generated adversarial examples\.
- •AdvTrain \(on\): online adversarial training that regenerates adversarial examples during streaming updates and updates on all of them\.
- •Sep\-Aware: online adversarial training \(every adversarial triggers an update\) where an overlap\-gated decision rule chooses between fast\-learning absorption into the best\-matching correct\-class category and creation of a new category \(Algorithm 3, thresholdθ\\theta\)\.
- •Selective \(off\): offline adversarial training with updates restricted to adversarial examples that induce misclassification\.
- •Selective \(on\): online adversarial training with the same selective\-update rule\.
- •Two\-Stage Sel\.: progressive two\-stage selective training with moderate\-ϵ\\epsilonadaptation in Stage 1 and stronger\-ϵ\\epsilonadaptation in Stage 2\.
For online variants, adversarial examples are generated on\-the\-fly with WB\-Softmax and discarded immediately after use, preserving the strict single\-pass streaming regime\. Separation\-aware training is motivated by the separation\-collapse pattern discussed in Section[5](https://arxiv.org/html/2605.06902#S5); Tables[1](https://arxiv.org/html/2605.06902#S7.T1)and[2](https://arxiv.org/html/2605.06902#S7.T2)report its cross\-dataset behavior, and Section[7](https://arxiv.org/html/2605.06902#S7)reports the threshold ablation\.
To keep adaptive PGD\-20 evaluation tractable for large\-category models \(up to 228K categories\), we evaluate adversarial accuracy on a fixed subset ofn=1000n=1000test samples that are correctly classified atϵ=0\\epsilon=0\. Unless otherwise noted, the reported robust\-accuracy curves and AURAC values are therefore conditional on correct clean prediction\. In figures, we abbreviate this as “Cond\. robust accuracy”\. This conditional evaluation avoids conflating adversarial failure with clean misclassification and supports consistent comparison across defense methods with different clean accuracies\. Clean accuracy is reported separately in all main defense tables so that robustness–utility tradeoffs remain visible\.
White\-box robustness is evaluated under WB\-Softmax PGD, our strongest differentiable attack\. Black\-box robustness is evaluated primarily under CNN\-transfer PGD and LRS\-transfer PGD; Square attack\[[1](https://arxiv.org/html/2605.06902#bib.bib31)\]is used as an additional black\-box baseline for vanilla models \(Table[7](https://arxiv.org/html/2605.06902#S7.T7)\)\. All adversarial training methods also use WB\-Softmax attacks during training so that the defenses are evaluated against the strongest available adaptive white\-box attacker\.
Code will be released upon publication\. Additional theoretical details, including the constructive proof of Proposition 1 and auxiliary robustness summaries, are provided in Appendix A and Appendix B\.
## 7Results and Discussion
This section reports four main findings\. First, vanilla Fuzzy ARTMAP is highly vulnerable under an adaptive white\-box threat once the attack objective is aligned with ARTMAP’s native decision rule\. Second, defense rankings depend strongly on evaluation protocol: offline adversarial training can appear strong under transfer attacks yet collapse under adaptive white\-box evaluation, directly supporting Principle 1\. Third, among replay\-free defenses, progressive two\-stage selective training provides the strongest overall robustness across datasets\. Separation\-aware training, the geometric intervention motivated by separation collapse, matches standard online adversarial training in robustness; the overlap constraint does not fire in the recommended operating regime, and we report its behavior acrossθ\\thetalater in this section\. Fourth, post\-training match statistics can become semantically unreliable after selective adversarial training, indicating that rejection rules calibrated on vanilla models should not be transferred across training protocols without revalidation\.
### 7\.1Baseline Vulnerability
Figure[1](https://arxiv.org/html/2605.06902#S7.F1)characterizes the vulnerability of vanilla Fuzzy ARTMAP across vigilance levels under both adaptive white\-box WB\-Softmax PGD and black\-box transfer PGD\. Across all four datasets, WB\-Softmax is consistently stronger than black\-box transfer, showing that once the attack objective is aligned with ARTMAP’s category competition and map\-field structure, white\-box optimization yields substantial attack success\. This result is methodologically important: it confirms that ART models are not inherently protected by non\-smooth winner\-take\-all structure, and that adaptive white\-box evaluation is both feasible and necessary\.
Higher vigilance improves robustness under both attack types, consistent with the interpretation that finer category structure can reduce vulnerability to broader perturbation regions\. At the same time, the gap between WB\-Softmax and transfer attacks is especially informative: black\-box transfer alone would materially underestimate the vulnerability of the deployed ARTMAP model\. This observation motivates the remainder of the section, where all defenses are compared under the stronger adaptive white\-box protocol\.
Figure 1:Baseline vulnerability of vanilla Fuzzy ARTMAP \(ϵ≤0\.35\\epsilon\\leq 0\.35\)\.Top: White\-box Softmax PGD \(WB\-Softmax\) attack with temperatureτ=0\.01\\tau=0\.01\.Bottom: Black\-box transfer PGD attack\. The y\-axis label “Cond\. robust accuracy” denotes conditional robust accuracy, i\.e\., accuracy measured on the clean\-correct subset\.
### 7\.2Defense Comparison Under WB\-Softmax
Table[1](https://arxiv.org/html/2605.06902#S7.T1)compares all defenses atρa=0\.9\\rho\_\{a\}=0\.9under the adaptive white\-box WB\-Softmax threat model, while Figure[2](https://arxiv.org/html/2605.06902#S7.F2)shows representative accuracy–ϵ\\epsiloncurves on Fashion\-MNIST\. We report clean accuracy, adversarial accuracy atϵ=0\.30\\epsilon=0\.30, AURAC, and category count\. Separation\-aware training is included for completeness across datasets, while Table[4](https://arxiv.org/html/2605.06902#S7.T4)highlights its targeted high\-ϵ\\epsiloneffect on USPS\.
Tables[1](https://arxiv.org/html/2605.06902#S7.T1)and[2](https://arxiv.org/html/2605.06902#S7.T2)should be read jointly\. Table[1](https://arxiv.org/html/2605.06902#S7.T1)reports robustness under the adaptive white\-box WB\-Softmax threat model, whereas Table[2](https://arxiv.org/html/2605.06902#S7.T2)reports transfer\-based black\-box robustness\. The resulting defense rankings differ sharply, and that reversal is itself one of the main findings of the paper\.
Table 1:Defense Comparison under WB\-Softmax PGD\-20 \(ρ=0\.9\\rho=0\.9\)\. Mean±\\pmstd over 3 seedsUSPSMNISTMethodAURACAURACVanilla92\.3±\\pm0\.60\.5±\\pm0\.121\.7±\\pm0\.26K94\.9±\\pm0\.231\.7±\\pm0\.661\.5±\\pm0\.156KAdvTrain \(off\)92\.4±\\pm0\.10\.3±\\pm0\.112\.6±\\pm0\.412K94\.5±\\pm0\.10\.0±\\pm0\.014\.8±\\pm0\.2103KAdvTrain \(on\)92\.9±\\pm0\.410\.6±\\pm1\.225\.9±\\pm0\.613K94\.1±\\pm0\.121\.1±\\pm4\.451\.2±\\pm0\.6111KSelective \(off\)92\.2±\\pm0\.03\.0±\\pm0\.925\.4±\\pm0\.711K94\.5±\\pm0\.026\.3±\\pm0\.653\.6±\\pm0\.194KSelective \(on\)92\.3±\\pm0\.82\.6±\\pm0\.924\.7±\\pm0\.412K94\.9±\\pm0\.226\.7±\\pm1\.153\.0±\\pm0\.695KSep\-Aware92\.8±\\pm0\.49\.7±\\pm1\.726\.2±\\pm0\.913K94\.2±\\pm0\.125\.1±\\pm2\.351\.9±\\pm0\.2111KTwo\-Stage Sel\.92\.1±\\pm0\.15\.4±\\pm0\.628\.2±\\pm0\.214K94\.5±\\pm0\.145\.8±\\pm1\.664\.5±\\pm0\.789KFashion\-MNISTEMNIST\-LettersMethodAURACAURACVanilla80\.7±\\pm0\.50\.9±\\pm0\.224\.4±\\pm0\.657K80\.8±\\pm0\.612\.2±\\pm0\.137\.1±\\pm1\.0114KAdvTrain \(off\)82\.2±\\pm0\.21\.1±\\pm0\.214\.9±\\pm0\.2109K81\.4±\\pm0\.20\.0±\\pm0\.113\.0±\\pm0\.6212KAdvTrain \(on\)82\.8±\\pm1\.020\.7±\\pm0\.534\.0±\\pm0\.2111K80\.4±\\pm0\.75\.8±\\pm1\.226\.9±\\pm0\.3226KSelective \(off\)82\.4±\\pm0\.115\.3±\\pm1\.235\.1±\\pm0\.5101K80\.7±\\pm0\.07\.8±\\pm2\.027\.2±\\pm0\.9209KSelective \(on\)81\.1±\\pm0\.417\.0±\\pm0\.834\.8±\\pm0\.1101K80\.7±\\pm0\.76\.4±\\pm0\.725\.8±\\pm0\.6210KSep\-Aware82\.9±\\pm0\.920\.5±\\pm1\.034\.7±\\pm0\.5111K80\.3±\\pm0\.55\.1±\\pm1\.826\.8±\\pm0\.7226KTwo\-Stage Sel\.82\.4±\\pm0\.123\.0±\\pm1\.041\.3±\\pm0\.2121K80\.7±\\pm0\.022\.5±\\pm1\.039\.8±\\pm0\.1228K
Accadv\(0\.30\)\\mathrm\{Acc\}\_\{\\mathrm\{adv\}\}\(0\.30\)= adversarial accuracy \(%\) atϵ=0\.30\\epsilon=0\.30\. AURAC \(%\) = Area Under the Robust Accuracy Curve \(ϵ∈\[0\.05,0\.35\]\\epsilon\\in\[0\.05,0\.35\]\)\. \# Categories = category count \(K = thousands\)\. Bold = best per dataset\.
Table 2:Defense Comparison under Black\-Box Transfer PGD\-20 \(ρ=0\.9\\rho=0\.9\)\. Mean±\\pmstd over 3 seedsUSPSMNISTMethodAURACAURACVanilla92\.2±\\pm0\.19\.1±\\pm0\.348\.4±\\pm0\.26K94\.5±\\pm0\.185\.8±\\pm0\.492\.1±\\pm0\.256KAdvTrain \(off\)92\.4±\\pm0\.241\.5±\\pm1\.963\.8±\\pm0\.812K94\.5±\\pm0\.181\.1±\\pm0\.689\.2±\\pm0\.3103KAdvTrain \(on\)92\.9±\\pm0\.525\.7±\\pm1\.157\.3±\\pm0\.613K94\.1±\\pm0\.279\.4±\\pm1\.688\.2±\\pm1\.0111KSelective \(off\)92\.1±\\pm0\.129\.2±\\pm0\.458\.3±\\pm0\.211K94\.5±\\pm0\.088\.7±\\pm1\.293\.3±\\pm0\.294KSelective \(on\)92\.5±\\pm0\.224\.0±\\pm0\.556\.3±\\pm0\.312K94\.5±\\pm0\.189\.2±\\pm0\.393\.5±\\pm0\.295KSep\-Aware92\.8±\\pm0\.525\.8±\\pm0\.357\.8±\\pm0\.313K94\.2±\\pm0\.278\.0±\\pm1\.987\.5±\\pm0\.7111KTwo\-Stage Sel\.92\.1±\\pm0\.130\.0±\\pm0\.759\.4±\\pm0\.214K94\.5±\\pm0\.089\.4±\\pm0\.493\.7±\\pm0\.289KFashion\-MNISTEMNIST\-LettersMethodAURACAURACVanilla80\.8±\\pm0\.267\.8±\\pm0\.579\.8±\\pm0\.357K80\.7±\\pm0\.089\.1±\\pm1\.092\.8±\\pm0\.3114KAdvTrain \(off\)82\.2±\\pm0\.372\.5±\\pm0\.282\.1±\\pm0\.1109K81\.4±\\pm0\.270\.1±\\pm2\.080\.5±\\pm0\.6212KAdvTrain \(on\)82\.8±\\pm1\.268\.6±\\pm1\.679\.9±\\pm0\.9111K80\.4±\\pm0\.770\.0±\\pm1\.579\.5±\\pm0\.5226KSelective \(off\)82\.4±\\pm0\.167\.5±\\pm0\.279\.9±\\pm0\.2101K80\.7±\\pm0\.082\.3±\\pm2\.789\.9±\\pm1\.1209KSelective \(on\)81\.7±\\pm0\.268\.5±\\pm0\.480\.4±\\pm0\.2101K80\.3±\\pm0\.183\.7±\\pm1\.490\.4±\\pm0\.4210KSep\-Aware82\.9±\\pm1\.268\.3±\\pm2\.279\.7±\\pm1\.1111K80\.3±\\pm0\.572\.8±\\pm1\.280\.8±\\pm1\.0226KTwo\-Stage Sel\.82\.4±\\pm0\.269\.0±\\pm0\.480\.5±\\pm0\.2121K80\.7±\\pm0\.085\.5±\\pm1\.190\.7±\\pm0\.8228K
Transfer attacks use the SimpleCNN surrogate trained on each dataset\.Accadv\(0\.30\)\\mathrm\{Acc\}\_\{\\mathrm\{adv\}\}\(0\.30\)= adversarial accuracy \(%\) atϵ=0\.30\\epsilon=0\.30\. AURAC \(%\) = Area Under the Robust Accuracy Curve \(ϵ∈\[0\.05,0\.35\]\\epsilon\\in\[0\.05,0\.35\]\)\. \# Categories = category count \(K = thousands\)\. Bold = best per dataset\.
Figure 2:Defense comparison on Fashion\-MNIST under WB\-Softmax PGD\-20 \(ρ=0\.9\\rho=0\.9\)\. Two\-stage selective training achieves the highest robust accuracy, while offline adversarial training collapses below vanilla\. Mean over 3 seeds\. For readability, the main text shows representative accuracy–ϵ\\epsiloncurves for Fashion\-MNIST; full tabular results for all datasets are reported in Tables[1](https://arxiv.org/html/2605.06902#S7.T1)–[2](https://arxiv.org/html/2605.06902#S7.T2)\.Figure 3:Defense comparison on Fashion\-MNIST under LRS Transfer PGD\-20 \(ρ=0\.9\\rho\{=\}0\.9\)\. Offline adversarial training achieves highest accuracy under black\-box transfer attacks, reversing its poor white\-box performance\. Mean over 3 seeds\.The clearest example is offline adversarial training\. Under adaptive white\-box evaluation, it collapses on all four datasets despite substantial category growth\. Under transfer\-based black\-box evaluation, however, its behavior is dataset\-dependent\. On USPS and Fashion\-MNIST, offline training attains the best black\-box AURAC \(63\.8% and 82\.1%\), substantially exceeding vanilla \(48\.4% and 79\.8%\)\. By contrast, on MNIST and EMNIST\-Letters, where vanilla already attains high black\-box AURAC \(92\.1% and 92\.8%\), adversarial training degrades rather than improves transfer robustness\. The USPS and Fashion\-MNIST reversal directly illustrates Principle 1: in single\-pass incremental learners, adversarial examples crafted once against a fixed clean\-trained reference model can become stale after the classifier is further modified by adversarial training, because the final deployed decision boundary is no longer the one against which those perturbations were generated\. As a result, offline adversarial training may learn perturbation patterns that transfer across models without conferring genuine robustness to adaptive attacks on the final adversarially trained model\.
On EMNIST\-Letters in particular, all six adversarial\-training variants reduce black\-box AURAC by 2–13 percentage points relative to vanilla, indicating that in regimes where the vanilla classifier already has high native black\-box robustness, protocol choices that increase category count may widen rather than close the gap between category granularity and CNN\-surrogate perturbation directions\.
Online adversarial training removes the stale\-attack mismatch of offline training, but its effect remains dataset\-dependent\. This indicates that regenerating perturbations on\-the\-fly is not sufficient; robustness also depends on which adversarial samples are learned and when they are introduced during streaming adaptation\.
The most consistent replay\-free gains come from two\-stage selective training, which achieves the best overall white\-box AURAC across all four datasets\. This supports the broader conclusion that streaming robustness depends not only on attack strength but also on protocol design: progressive epsilon scheduling stabilizes adaptation, while selective filtering concentrates updates on genuinely vulnerable regions\.
USPS provides an instructive case study where the diagnostics introduced in Section[5](https://arxiv.org/html/2605.06902#S5)are particularly visible\. Geometry monitoring on USPS reveals a clear separation\-collapse pattern: under selective adversarial training, minimum separation drops sharply while compactness remains high \(Table[3](https://arxiv.org/html/2605.06902#S7.T3)\), indicating cross\-class encroachment during adversarial updates to existing categories\. This structural failure primarily harms high\-ϵ\\epsilonrobustness and motivates the separation\-aware update rule in Algorithm[3](https://arxiv.org/html/2605.06902#alg3)\.
Table 3:Separation\-collapse indicators on USPS \(ρa=0\.9\\rho\_\{a\}=0\.9\)\. “\# Categories” denotes the number of learned categories\. “MS” denotes the minimum\-separation indicator in \([25](https://arxiv.org/html/2605.06902#S5.E25)\)\. “Compactness” denotes the online iCVI compactness statistic computed from the current partition of learned categories/classes\[[30](https://arxiv.org/html/2605.06902#bib.bib60),[5](https://arxiv.org/html/2605.06902#bib.bib49),[6](https://arxiv.org/html/2605.06902#bib.bib48)\]\. For this compactness statistic, smaller values indicate more concentrated within\-partition structure, so the table should be interpreted jointly with MS rather than as an absolute stand\-alone quality score\.Separation\-aware training, motivated by the separation\-collapse diagnosis, matches AdvTrain \(on\) on USPS atϵ=0\.30\\epsilon=0\.30\(9\.7±\\pm1\.7% vs\. 10\.6±\\pm1\.2%\) and across category count and AURAC \(13K cats, 26\.2±\\pm0\.9% vs\. 13K cats, 25\.9±\\pm0\.6%\); both exceed the selective family \(5\.4±\\pm0\.6% Two\-Stage Sel\., 2\.6±\\pm0\.9% Selective on atϵ=0\.30\\epsilon=0\.30; Table[4](https://arxiv.org/html/2605.06902#S7.T4)\)\. Theθ\\thetaablation \(Table[5](https://arxiv.org/html/2605.06902#S7.T5)\) explains why: acrossθ∈\[0\.001,0\.1\]\\theta\\in\[0\.001,0\.1\], the overlap check rejects every absorption candidate, so the algorithm reduces to AdvTrain \(on\); largerθ\\thetaadmits cross\-class hyperbox expansions and triggers separation collapse\. Overlap\-only gating therefore admits no operating point distinct from AdvTrain \(on\)\.
Table 4:USPS High\-ϵ\\epsilonRobustness \(mean±\\pmstd over 3 seeds\)\.*Accadv\(0\.25\) and Accadv\(0\.30\) denote adversarial accuracy \(%\) atϵ=0\.25\\epsilon=0\.25andϵ=0\.30\\epsilon=0\.30, respectively\. Sep\-Aware usesθ=0\.01\\theta=0\.01\. AdvTrain \(on\), shown in Table[1](https://arxiv.org/html/2605.06902#S7.T1), achieves 10\.6±\\pm1\.2% atϵ=0\.30\\epsilon=0\.30, statistically tied with Sep\-Aware\.*
Table 5:Separation\-Aware Thresholdθ\\thetaAblation \(USPS,ρ=0\.9\\rho\{=\}0\.9\)\. Mean±\\pmstd over 3 seeds\.*Adv@0\.25*and*Adv@0\.30*denote adversarial accuracy \(%\) atϵ=0\.25\\epsilon=0\.25andϵ=0\.30\\epsilon=0\.30, respectively\. AURAC \(%\) = Area Under the Robust Accuracy Curve\. Bold = default setting\. Values forθ∈\[0\.001,0\.1\]\\theta\\in\[0\.001,0\.1\]are bit\-identical: the overlap check rejects every absorption candidate, so the update reduces to new\-category creation \(the AdvTrain\-on path\)\. Atθ=0\.5\\theta=0\.5the check admits roughly 23% of absorptions and induces separation collapse; atθ=1\.0\\theta=1\.0it is effectively disabled\.
Table[5](https://arxiv.org/html/2605.06902#S7.T5)characterizes the constraint acrossθ\\theta\. Acrossθ∈\[0\.001,0\.1\]\\theta\\in\[0\.001,0\.1\], the overlap check rejects every absorption candidate, so the update reduces to the new\-category creation path used by standard online adversarial training\. Loosening toθ=0\.5\\theta=0\.5admits roughly 23% of absorptions and triggers separation collapse \(AURAC drops to 11\.9%\);θ=1\.0\\theta=1\.0effectively disables the check \(∼83%\{\\sim\}83\\%absorption\) and yields 15\.9% AURAC\. Acrossθ\\theta, no operating point yields robustness beyond AdvTrain \(on\); overlap\-only absorption gating in this form does not differentiate from AdvTrain \(on\) on streaming Fuzzy ARTMAP\.
The second main empirical phenomenon is match\-score ordering reversal\. On vanilla models, adversarial examples tend to produce lower match scores than clean samples on USPS \(0\.987 vs\. 0\.997\), which suggests that match could be used as a rejection signal\. After selective adversarial training, however, that ordering can invert \(0\.983 vs\. 0\.968\), and the corresponding match\-threshold detector degrades sharply: AUC falls from 0\.72 to 0\.38, below chance\. This observation is consistent with the theoretical development in Section[5](https://arxiv.org/html/2605.06902#S5)\. Lemma[1](https://arxiv.org/html/2605.06902#Thmlemma1)establishes that selective absorption preserves the absorbed\-category match locally, while Proposition[1](https://arxiv.org/html/2605.06902#Thmproposition1)explains why post\-training match need not remain monotonic with correctness globally\. Empirically, the USPS inversion shows that this is not merely a theoretical possibility: internal scores that are reliable in the vanilla regime can become unreliable after adversarial training\. In deployment terms, rejection, abstention, or escalation rules that treat match as a trust signal must therefore be recalibrated jointly with the training protocol rather than transferred unchanged from the vanilla model\.
Because the main robustness results in Tables[1](https://arxiv.org/html/2605.06902#S7.T1)–[2](https://arxiv.org/html/2605.06902#S7.T2)are reported on the clean\-correct subset, Appendix B provides a derived unconditional view by converting the reported clean accuracy and conditional robustness into unconditional robust accuracy and unconditional AURAC\. The results show that the main qualitative conclusions remain unchanged: two\-stage selective training remains the strongest overall replay\-free defense, and separation\-aware training remains statistically tied with standard online adversarial training at highϵ\\epsilonon USPS\. Thus, the main ranking claims are not merely an artifact of conditioning on correct clean prediction\.
### 7\.3Additional Analyses and Verification
Table[7](https://arxiv.org/html/2605.06902#S7.T7)further verifies that the proposed WB\-Softmax threat model is a strong adaptive white\-box evaluation protocol\. On vanilla models atρa=0\.9\\rho\_\{a\}=0\.9, WB\-Softmax consistently exceeds or matches the strongest black\-box baselines overall\. Transfer remains relatively strong on USPS, but is much weaker on MNIST and EMNIST\-Letters and only partially competitive on Fashion\-MNIST\. These results support the interpretation that the white\-box findings above reflect genuine vulnerability rather than weak optimization\.
WB\-Softmax ablations\.Across datasets, increasing PGD steps monotonically strengthens the attack and largely saturates by 20 steps, which motivates PGD\-20 as the default evaluation setting\. The temperatureτ\\taucontrols the tradeoff between gradient smoothness and winner concentration\. Table[6](https://arxiv.org/html/2605.06902#S7.T6)reports attack success as a function ofτ\\tauon USPS\. Smallerτ\\tauvalues sharpen the softmax distribution and strengthen the attack; at the same time, very small temperatures can create numerical instability\. Atτ=0\.01\\tau=0\.01, the attack achieves 99\.4% success atϵ=0\.30\\epsilon=0\.30, essentially matching the strongest settings while remaining numerically stable\. This supports our choice ofτ=0\.01\\tau=0\.01as a robust default\.
Table 6:WB\-Softmax Attack Strength vs\. Temperatureτ\\tau\(USPS, PGD\-20\)Attack success rate \(%\) onn=500n\{=\}500clean\-correct samples\. Lowerτ\\tauconcentrates probability on the winner, strengthening the attack\. Bold = default setting\.
Table 7:Attack Strength Comparison \(Vanilla Models,ρ=0\.9\\rho=0\.9,ϵ=0\.35\\epsilon=0\.35\)Attack success rate \(%\) onn=1000n\{=\}1000clean\-correct samples\. PGD uses 20 steps with step sizeη=ϵ/4\\eta=\\epsilon/4\. Square Attack uses 5000 queries/sample\.
Category growth alone does not explain robustness\. All adversarial training variants increase category count, but the resulting robustness varies sharply\. The clearest counterexample is offline adversarial training on MNIST: it creates 103K categories, among the largest models considered, yet yields the worst AURAC \(14\.8%\)\. By contrast, two\-stage selective training achieves the best robustness with more moderate growth\. This indicates that, in ARTMAP, robustness depends more on how categories are created, updated, and scheduled than on how many categories are ultimately formed\.
Robustness\-per\-compute tradeoff\.Table[8](https://arxiv.org/html/2605.06902#S7.T8)compares training cost \(wall\-clock time and category count\) against robustness \(AURAC under FGSM and PGD\-20 evaluation\) for selective training with varying attack strengths during training\. Stronger training attacks increase both computational cost and resulting robustness, but with diminishing returns\. PGD\-20 selective training attains the highest AURAC against PGD\-20 \(22\.1%\) at 78\.9 s, whereas PGD\-10 attains 19\.4% at 49\.7 s and FGSM attains 19\.0% at only 17\.3 s\. These results suggest that moderate training attack strengths \(PGD\-5 to PGD\-10\) may offer a favorable efficiency–robustness tradeoff in deployment scenarios where training cost is constrained\.
Table 8:Robustness\-per\-Compute Tradeoff \(USPS, Selective Training\)Mean over 3 seeds\. Time = training wall\-clock time\. \#Cat\. = category count\. AURAC is evaluated against FGSM and PGD\-20 attacks overϵ∈\[0\.05,0\.35\]\\epsilon\\in\[0\.05,0\.35\]\.
Finally, Table[9](https://arxiv.org/html/2605.06902#S7.T9)verifies that the reported robustness is not an artifact of gradient masking\. All four adapted checks are satisfied on all four datasets: iterative PGD is at least as strong as FGSM, adaptive white\-box attacks match or exceed transfer attacks, the smooth WB\-Softmax surrogate is stronger than hard winner\-take\-all optimization, and accuracy degrades smoothly asϵ\\epsilonincreases\. Together with the attack\-strength results above, these checks support the validity of the empirical conclusions in this section\.
Table 9:Anti\-Gradient\-Masking Sanity Checks \(ρ=0\.9\\rho\{=\}0\.9\)Evaluated on vanilla models atϵ=0\.35\\epsilon\{=\}0\.35\. WB = WB\-Softmax, BB = black\-box\.
## 8Conclusion
We presented a systematic study of adversarial robustness in Fuzzy ARTMAP under strict single\-pass streaming constraints\. The central message is that robustness in this regime cannot be studied by simply transplanting the methodology of offline deep networks\. Because ARTMAP predicts through winner\-take\-all category competition and evolves continuously through replay\-free updates, both attack construction and robustness evaluation must be aligned with the model’s native mechanism and final streamed state\.
Our results support this claim along three dimensions\. First, WB\-Softmax provides a strong mechanism\-aligned adaptive white\-box evaluator, showing that meaningful gradients and severe vulnerabilities remain present despite ARTMAP’s non\-smooth competition structure\. Second, robustness outcomes depend critically on protocol design in streaming learners: offline adversarial training can appear effective under transfer\-based evaluation while collapsing under adaptive white\-box attacks on the final deployed model, whereas progressive two\-stage selective training yields the strongest overall replay\-free robustness across USPS, MNIST, Fashion\-MNIST, and EMNIST\-Letters\. Third, ART’s explicit category geometry is not only interpretable but operationally useful for diagnostics: geometry monitoring reveals separation collapse as a structural failure mode, and match\-score analysis exposes a distinct semantic reliability failure \(match\-score inversion\)\. The geometric diagnosis admits a natural absorption\-gating rule \(Sep\-Aware\), and we characterize its behavior fully: at the recommendedθ\\theta, the rule’s update path coincides with AdvTrain \(on\) because adversarial examples lie in regions where absorption\-driven hyperbox expansion crosses class boundaries\. The diagnostic framework therefore makes the structural constraint on overlap\-only gating explicit and motivates the alternative gating criteria discussed below\.
Taken together, these findings show that adversarial robustness in streaming prototype\-based learners is simultaneously a problem of attack alignment, protocol design, and post\-training reliability\. More broadly, the framework developed here suggests that interpretable internal structure can play a dual role in streaming robustness research: it can improve diagnosis of failure modes and also support targeted replay\-free interventions that would be difficult to design in black\-box models\.
Several directions remain open\. First, our study focuses onℓ∞\\ell\_\{\\infty\}attacks and controlled small\-to\-medium image benchmarks; extending the analysis to larger datasets, higher\-dimensional inputs, real\-world streams, and additional modalities is an important next step\. Another important direction is cross\-architecture transfer from ART\-generated adversarial examples to offline deep models\. The present work focuses on the inverse direction—using deep surrogate transfer attacks as black\-box evaluators for ARTMAP—and on adaptive white\-box evaluation of the final streamed ARTMAP model\. Studying whether adversarial examples generated from ART category geometry transfer to offline convolutional architectures would further clarify the interaction between prototype\-based streaming learners and mainstream deep models\. Second, certified robustness, distributionally robust guarantees, and tighter theoretical characterizations of geometry evolution in ART remain largely unexplored\. Third, the structural limitation of overlap\-only gating motivates richer absorption criteria: margin\-based gating, delta\-overlap constraints \(rejecting only absorptions that*increase*cross\-class overlap\), and non\-geometric criteria such as match\-score statistics are natural next interventions within the diagnosis\-to\-rule framework\. Finally, the observed match\-score inversion suggests a broader research direction at the intersection of streaming robustness and deployment reliability: understanding when interpretable internal scores remain valid after adversarial adaptation, and how they should be recalibrated when they do not\.
## Acknowledgment
This research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF\-22\-2\-0209\. This research was supported by NSF grant 2420248 and by the Kummer Institute, Mary Finley Endowment, and Intelligent Systems Center of the Missouri University of Science and Technology\.
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U\.S\. Government\. The U\.S\. Government is authorized to reproduce and distribute reprints for Government purposes, notwithstanding any copyright notation herein\.
The computation for this work was performed on the high\-performance computing infrastructure provided by Research Support Solutions at Missouri University of Science and Technology https://doi\.org/10\.71674/PH64\-N397
## References
- \[1\]M\. Andriushchenko, F\. Croce, N\. Flammarion, and M\. Hein\(2020\)Square attack: a query\-efficient black\-box adversarial attack via random search\.InEuropean Conference on Computer Vision \(ECCV\),pp\. 484–501\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1),[§6](https://arxiv.org/html/2605.06902#S6.p8.1)\.
- \[2\]A\. Athalye, N\. Carlini, and D\. Wagner\(2018\)Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples\.InInternational Conference on Machine Learning \(ICML\),pp\. 274–283\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1),[§3\.3](https://arxiv.org/html/2605.06902#S3.SS3.p1.1),[§4](https://arxiv.org/html/2605.06902#S4.SS0.SSS0.Px2.p1.1)\.
- \[3\]J\. Bang, H\. Koh, S\. Park, H\. Song, J\. Ha, and J\. Choi\(2022\)Online continual learning on a contaminated data stream with blurry task boundaries\.InIEEE/CVF Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 9265–9274\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1)\.
- \[4\]L\. E\. Brito da Silva, I\. Elnabarawy, and D\. C\. Wunsch II\(2019\-12\)A survey of adaptive resonance theory neural network models for engineering applications\.Neural Networks120,pp\. 167–203\.External Links:[Document](https://dx.doi.org/10.1016/j.neunet.2019.09.012)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p4.1)\.
- \[5\]L\. E\. Brito da Silva, N\. M\. Melton, and D\. C\. Wunsch II\(2020\)Incremental cluster validity indices for online learning of hard partitions: extensions and comparative study\.IEEE Access8,pp\. 22025–22047\.External Links:[Document](https://dx.doi.org/10.1109/ACCESS.2020.2969849)Cited by:[§5\.1](https://arxiv.org/html/2605.06902#S5.SS1.p1.1),[Table 3](https://arxiv.org/html/2605.06902#S7.T3),[Table 3](https://arxiv.org/html/2605.06902#S7.T3.2.1)\.
- \[6\]L\. E\. Brito da Silva, N\. Rayapati, and D\. C\. Wunsch II\(2023\)ICVI\-ARTMAP: using incremental cluster validity indices and adaptive resonance theory reset mechanism to accelerate validation and achieve multiprototype unsupervised representations\.IEEE Transactions on Neural Networks and Learning Systems34\(12\),pp\. 9757–9770\.External Links:[Document](https://dx.doi.org/10.1109/TNNLS.2022.3160381)Cited by:[§5\.1](https://arxiv.org/html/2605.06902#S5.SS1.p1.1),[Table 3](https://arxiv.org/html/2605.06902#S7.T3),[Table 3](https://arxiv.org/html/2605.06902#S7.T3.2.1)\.
- \[7\]S\. Cairns, L\. E\. Brito da Silva, S\. Petrenko, D\. C\. Wunsch II, and J\. Liu\(2026\)Robustness of Fuzzy ARTMAP to Adversarial Attacks and Progressive Adversarial Training for Streaming Learning\.InProceedings of the International Joint Conference on Neural Networks \(IJCNN\),Note:Accepted for presentationCited by:[§1](https://arxiv.org/html/2605.06902#S1.p6.1)\.
- \[8\]N\. Carlini and D\. Wagner\(2017\)Towards evaluating the robustness of neural networks\.InIEEE Symposium on Security and Privacy \(SP\),pp\. 39–57\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1),[§3\.1](https://arxiv.org/html/2605.06902#S3.SS1.p3.3),[§3](https://arxiv.org/html/2605.06902#S3.p1.1)\.
- \[9\]G\. A\. Carpenter, S\. Grossberg, N\. Markuzon, J\. H\. Reynolds, and D\. B\. Rosen\(1992\)Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps\.IEEE Transactions on Neural Networks3\(5\),pp\. 698–713\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1),[§2](https://arxiv.org/html/2605.06902#S2.p5.6)\.
- \[10\]G\. A\. Carpenter, S\. Grossberg, and D\. B\. Rosen\(1991\)Fuzzy ART: fast stable learning and categorization of analog patterns by an adaptive resonance system\.Neural Networks4\(6\),pp\. 759–771\.Cited by:[§2](https://arxiv.org/html/2605.06902#S2.p1.1)\.
- \[11\]A\. Chaudhry, M\. Ranzato, M\. Rohrbach, and M\. Elhoseiny\(2019\)Efficient lifelong learning with a\-gem\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1)\.
- \[12\]C\. K\. Chow\(1970\)On optimum recognition error and reject tradeoff\.IEEE Transactions on Information Theory16\(1\),pp\. 41–46\.External Links:[Document](https://dx.doi.org/10.1109/TIT.1970.1054406)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p9.1),[§5\.2](https://arxiv.org/html/2605.06902#S5.SS2.p12.1),[§5](https://arxiv.org/html/2605.06902#S5.p3.1)\.
- \[13\]F\. Croce, M\. Andriushchenko, V\. Sehwag, E\. Debenedetti, N\. Flammarion, M\. Chiang, P\. Mittal, and M\. Hein\(2021\)RobustBench: a standardized adversarial robustness benchmark\.InAdvances in Neural Information Processing Systems \(NeurIPS\) Datasets and Benchmarks Track,Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1),[§3\.1](https://arxiv.org/html/2605.06902#S3.SS1.p3.3),[§3](https://arxiv.org/html/2605.06902#S3.p1.1)\.
- \[14\]F\. Croce and M\. Hein\(2020\)Reliable evaluation of adversarial robustness with an ensemble of diverse parameter\-free attacks\.InInternational Conference on Machine Learning \(ICML\),pp\. 2206–2216\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1)\.
- \[15\]M\. De Lange, R\. Aljundi, M\. Masana, S\. Parisot, X\. Jia, A\. Leonardis, G\. Slabaugh, and T\. Tuytelaars\(2022\)A continual learning survey: defying forgetting in classification tasks\.IEEE Transactions on Pattern Analysis and Machine Intelligence44\(7\),pp\. 3366–3385\.External Links:[Document](https://dx.doi.org/10.1109/TPAMI.2021.3057446)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1)\.
- \[16\]Y\. Geifman and R\. El\-Yaniv\(2017\)Selective classification for deep neural networks\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p9.1),[§5\.2](https://arxiv.org/html/2605.06902#S5.SS2.p12.1),[§5](https://arxiv.org/html/2605.06902#S5.p3.1)\.
- \[17\]Y\. Geifman and R\. El\-Yaniv\(2019\)SelectiveNet: a deep neural network with an integrated reject option\.InInternational Conference on Machine Learning \(ICML\),pp\. 2151–2159\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p9.1),[§5\.2](https://arxiv.org/html/2605.06902#S5.SS2.p12.1),[§5](https://arxiv.org/html/2605.06902#S5.p3.1)\.
- \[18\]I\. J\. Goodfellow, J\. Shlens, and C\. Szegedy\(2014\)Explaining and harnessing adversarial examples\.arXiv preprint arXiv:1412\.6572\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1),[§3\.2](https://arxiv.org/html/2605.06902#S3.SS2.p2.5)\.
- \[19\]S\. Grossberg\(2013\)Adaptive resonance theory: how a brain learns to consciously attend, learn, and recognize a changing world\.Neural Networks37,pp\. 1–47\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p4.1)\.
- \[20\]D\. Hendrycks and K\. Gimpel\(2017\)A baseline for detecting misclassified and out\-of\-distribution examples in neural networks\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p9.1),[§5\.2](https://arxiv.org/html/2605.06902#S5.SS2.p12.1),[§5](https://arxiv.org/html/2605.06902#S5.p3.1)\.
- \[21\]M\. Huai, X\. Li, C\. Miao, L\. Sun, and A\. Zhang\(2022\)On the robustness of metric learning: an adversarial perspective\.ACM Transactions on Knowledge Discovery from Data16\(5\)\.External Links:[Document](https://dx.doi.org/10.1145/3502726)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[22\]J\. Kirkpatrick, R\. Pascanu, N\. Rabinowitz, J\. Veness, G\. Desjardins, A\. A\. Rusu, K\. Milan, J\. Quan, T\. Ramalho, A\. Grabska\-Barwinska, D\. Hassabis, C\. Clopath, D\. Kumaran, and R\. Hadsell\(2017\)Overcoming catastrophic forgetting in neural networks\.Proceedings of the National Academy of Sciences114\(13\),pp\. 3521–3526\.External Links:[Document](https://dx.doi.org/10.1073/pnas.1611835114)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1),[§3\.1](https://arxiv.org/html/2605.06902#S3.SS1.p1.5)\.
- \[23\]Z\. Li and D\. Hoiem\(2018\)Learning without forgetting\.IEEE Transactions on Pattern Analysis and Machine Intelligence40\(12\),pp\. 2935–2947\.External Links:[Document](https://dx.doi.org/10.1109/TPAMI.2017.2773081)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1)\.
- \[24\]D\. Lopez\-Paz and M\. Ranzato\(2017\)Gradient episodic memory for continual learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1)\.
- \[25\]A\. Madry, A\. Makelov, L\. Schmidt, D\. Tsipras, and A\. Vladu\(2018\)Towards deep learning models resistant to adversarial attacks\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1),[§3\.2](https://arxiv.org/html/2605.06902#S3.SS2.p2.4)\.
- \[26\]N\. M\. Melton, L\. E\. Brito da Silva, S\. Petrenko, and D\. C\. Wunsch II\(2025\)Deep ARTMAP: generalized hierarchical learning with adaptive resonance theory\.arXiv preprint arXiv:2503\.07641\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[27\]N\. M\. Melton, L\. E\. Brito da Silva, and D\. C\. Wunsch II\(2025\)An extensive analysis of match\-tracking methods for ARTMAP\.In2025 IEEE Symposium on Computational Intelligence in Health and Medicine \(CIHM\),pp\. 1–8\.External Links:[Document](https://dx.doi.org/10.1109/CIHM64979.2025.10969482)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[28\]N\. M\. Melton, D\. Tanksley, and D\. C\. Wunsch II\(2025\)Adaptive resonance lib: a Python package for adaptive resonance theory \(ART\) models\.Journal of Open Source Software10\(114\),pp\. 7764\.External Links:[Document](https://dx.doi.org/10.21105/joss.07764)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[29\]X\. Mi, F\. Tang, Z\. Yang, D\. Wang, J\. Cao, P\. Li, and Y\. Liu\(2025\)Adversarial robust memory\-based continual learner\.InIEEE/CVF International Conference on Computer Vision \(ICCV\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1)\.
- \[30\]M\. Moshtaghi, J\. C\. Bezdek, S\. M\. Erfani, C\. Leckie, and J\. Bailey\(2019\-Apr\.\)Online cluster validity indices for performance monitoring of streaming data clustering\.International Journal of Intelligent Systems34\(4\),pp\. 541–563\.Cited by:[§5\.1](https://arxiv.org/html/2605.06902#S5.SS1.p1.1),[Table 3](https://arxiv.org/html/2605.06902#S7.T3),[Table 3](https://arxiv.org/html/2605.06902#S7.T3.2.1)\.
- \[31\]V\. Mygdalis, A\. Iosifidis, A\. Tefas, and I\. Pitas\(2022\)Hyperspherical class prototypes for adversarial robustness\.Pattern Recognition125,pp\. 108527\.External Links:[Document](https://dx.doi.org/10.1016/j.patcog.2022.108527)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[32\]G\. I\. Parisi, R\. Kemker, J\. L\. Part, C\. Kanan, and S\. Wermter\(2019\)Continual lifelong learning with neural networks: a review\.Neural Networks113,pp\. 54–71\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1),[§3\.1](https://arxiv.org/html/2605.06902#S3.SS1.p1.5)\.
- \[33\]S\. Petrenko, L\. E\. Brito da Silva, and D\. C\. Wunsch II\(2025\)DeepART: deep gradient\-free local learning with adaptive resonance\.Neural Networks190,pp\. 107580\.External Links:[Document](https://dx.doi.org/10.1016/j.neunet.2025.107580)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[34\]Z\. Qian, K\. Huang, Q\. Wang, and X\. Zhang\(2022\)A survey of robust adversarial training in pattern recognition: fundamental, theory, and methodologies\.Pattern Recognition131,pp\. 108889\.External Links:[Document](https://dx.doi.org/10.1016/j.patcog.2022.108889)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1)\.
- \[35\]S\. Rebuffi, A\. Kolesnikov, G\. Sperl, and C\. H\. Lampert\(2017\)iCaRL: incremental classifier and representation learning\.InIEEE/CVF Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 5533–5542\.External Links:[Document](https://dx.doi.org/10.1109/CVPR.2017.587)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p3.1)\.
- \[36\]S\. Grossberg\(2021\)Conscious mind, resonant brain: how each brain makes a mind\.Oxford University Press,New York\.External Links:ISBN 9780190070557Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p4.1)\.
- \[37\]R\. Z\. Sabzevar, H\. Mohammadzadeh, T\. Tavakoli, and A\. Harati\(2025\)Deep positive\-negative prototypes for adversarially robust discriminative prototypical learning\.arXiv preprint arXiv:2504\.03782\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[38\]J\. Snell, K\. Swersky, and R\. S\. Zemel\(2017\)Prototypical networks for few\-shot learning\.InAdvances in Neural Information Processing Systems \(NeurIPS\),pp\. 4077–4087\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p5.1)\.
- \[39\]C\. Szegedy, W\. Zaremba, I\. Sutskever, J\. Bruna, D\. Erhan, I\. Goodfellow, and R\. Fergus\(2014\)Intriguing properties of neural networks\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1)\.
- \[40\]F\. Tramèr, A\. Kurakin, N\. Papernot, I\. Goodfellow, D\. Boneh, and P\. McDaniel\(2018\)Ensemble adversarial training: attacks and defenses\.InInternational Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1),[§3\.1](https://arxiv.org/html/2605.06902#S3.SS1.p5.6)\.
- \[41\]J\. Uesato, B\. O’Donoghue, A\. van den Oord, and P\. Kohli\(2018\)Adversarial risk and the dangers of evaluating against weak attacks\.InInternational Conference on Machine Learning \(ICML\),pp\. 5025–5034\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1)\.
- \[42\]T\. Wu, T\. Luo, and D\. C\. Wunsch II\(2024\)LRS: enhancing adversarial transferability through Lipschitz regularized surrogate\.InProceedings of the AAAI Conference on Artificial Intelligence \(AAAI\),Vol\.38,pp\. 6135–6143\.External Links:[Document](https://dx.doi.org/10.1609/aaai.v38i6.28430)Cited by:[§6](https://arxiv.org/html/2605.06902#S6.p4.4)\.
- \[43\]D\. Zhao, H\. Li, Q\. Luo, and W\. Hu\(2025\)Hölder Network for Improved Adversarial Robustness\.Neural Networks,pp\. 108145\.External Links:[Document](https://dx.doi.org/10.1016/j.neunet.2025.108145)Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1)\.
- \[44\]M\. Zhao, L\. Zhang, J\. Ye, H\. Lu, B\. Yin, and X\. Wang\(2024\)Adversarial training: a survey\.arXiv preprint arXiv:2410\.15042\.Cited by:[§1](https://arxiv.org/html/2605.06902#S1.p2.1)\.
## Appendix
This appendix provides additional theoretical and empirical details referenced in the main paper\. Specifically, it includes: \(i\) a constructive proof of Proposition 1, and \(ii\) auxiliary derived unconditional robustness tables corresponding to the conditional clean\-correct evaluation reported in the main text\.
## Appendix AProof of Proposition 1
Proof\.We provide a constructive argument for why selective adversarial training can break the monotone relationship between post\-training match score and correctness\.
Setup\.Let
I\(𝒙\)=\[𝒙;1−𝒙\]I\(\\bm\{x\}\)=\[\\bm\{x\};\\,1\-\\bm\{x\}\]\(27\)denote the complement\-coded input, and let categoryjjhave weight vector𝒘j\\bm\{w\}\_\{j\}\. The ARTMAP category\-level match function is
Mj\(I\)=\|I∧𝒘j\|\|I\|\.M\_\{j\}\(I\)=\\frac\{\|I\\wedge\\bm\{w\}\_\{j\}\|\}\{\|I\|\}\.\(28\)Under complement coding,\|I\(𝒙\)\|\|I\(\\bm\{x\}\)\|is constant for a fixed input dimension, so comparisons of category\-level match reduce to comparisons of the numerator\|I∧𝒘j\|\|I\\wedge\\bm\{w\}\_\{j\}\|\.
Under fast learning, when a sample with codeIIis absorbed by categoryjj, the update is
𝒘jnew=𝒘j∧I,\\bm\{w\}\_\{j\}^\{\\mathrm\{new\}\}=\\bm\{w\}\_\{j\}\\wedge I,\(29\)which can be interpreted as shrinking the category hyperbox toward the presented sample in complement\-coded space\.
Step 1: Selective training preferentially updates on misclassified adversarial samples\.By definition of selective adversarial training, only adversarial samples𝒙′\\bm\{x\}^\{\\prime\}with
f\(𝒙′\)≠yf\(\\bm\{x\}^\{\\prime\}\)\\neq y\(30\)trigger an update using labelyy, either by absorption into an existing correct\-class category or by creation of a new correct\-class category\. Clean samples that are already correctly classified do not trigger comparable updates under the same rule\.
Step 2: In the absorbed\-category case, match is preserved\.Step 2 establishes the match\-preservation lemma for the absorbed\-category case\. Consider a misclassified adversarial sample𝒙′\\bm\{x\}^\{\\prime\}that is absorbed into an existing correct\-class categoryjj\. For that same adversarial codeI\(𝒙′\)I\(\\bm\{x\}^\{\\prime\}\), after the update we have
I\(𝒙′\)∧𝒘jnew=I\(𝒙′\)∧\(𝒘j∧I\(𝒙′\)\)=I\(𝒙′\)∧𝒘j\.I\(\\bm\{x\}^\{\\prime\}\)\\wedge\\bm\{w\}\_\{j\}^\{\\mathrm\{new\}\}=I\(\\bm\{x\}^\{\\prime\}\)\\wedge\(\\bm\{w\}\_\{j\}\\wedge I\(\\bm\{x\}^\{\\prime\}\)\)=I\(\\bm\{x\}^\{\\prime\}\)\\wedge\\bm\{w\}\_\{j\}\.\(31\)Therefore,
Mjnew\(I\(𝒙′\)\)\\displaystyle M\_\{j\}^\{\\mathrm\{new\}\}\(I\(\\bm\{x\}^\{\\prime\}\)\)=\|I\(𝒙′\)∧𝒘jnew\|\|I\(𝒙′\)\|\\displaystyle=\\frac\{\|I\(\\bm\{x\}^\{\\prime\}\)\\wedge\\bm\{w\}\_\{j\}^\{\\mathrm\{new\}\}\|\}\{\|I\(\\bm\{x\}^\{\\prime\}\)\|\}\(32\)=\|I\(𝒙′\)∧𝒘j\|\|I\(𝒙′\)\|\\displaystyle=\\frac\{\|I\(\\bm\{x\}^\{\\prime\}\)\\wedge\\bm\{w\}\_\{j\}\|\}\{\|I\(\\bm\{x\}^\{\\prime\}\)\|\}=Mjold\(I\(𝒙′\)\)\.\\displaystyle=M\_\{j\}^\{\\mathrm\{old\}\}\(I\(\\bm\{x\}^\{\\prime\}\)\)\.Hence, for the absorbed\-category case, fast learning preserves the category\-level match of the updated adversarial sample exactly:
Mjnew\(I\(𝒙′\)\)=Mjold\(I\(𝒙′\)\)\.M\_\{j\}^\{\\mathrm\{new\}\}\(I\(\\bm\{x\}^\{\\prime\}\)\)=M\_\{j\}^\{\\mathrm\{old\}\}\(I\(\\bm\{x\}^\{\\prime\}\)\)\.\(33\)
In the special case of category creation, if a new correct\-class category is initialized at
𝒘jnewnew=I\(𝒙′\),\\bm\{w\}\_\{j\_\{\\mathrm\{new\}\}\}^\{\\mathrm\{new\}\}=I\(\\bm\{x\}^\{\\prime\}\),\(34\)then
Mjnew\(I\(𝒙′\)\)=1\.M\_\{j\_\{\\mathrm\{new\}\}\}\(I\(\\bm\{x\}^\{\\prime\}\)\)=1\.\(35\)Thus, while absorbed\-category match is preserved, higher attained match values can arise through category creation rather than ordinary absorption\.
Step 3: Preserved or high match does not imply correctness\.Match is a geometric quantity defined relative to category structure, whereas the final predictionf\(⋅\)f\(\\cdot\)is determined by winner\-take\-all competition together with the map field\. Consequently, correctness is not determined by category\-level match alone\.
In particular, after selective adversarial training, there can exist regimes in which:
1. 1\.an input𝒙1\\bm\{x\}\_\{1\}attains relatively high post\-training match to an absorbed or newly created category, yet is still misclassified because a competing category wins the global competition or maps to a different class; while
2. 2\.another input𝒙2\\bm\{x\}\_\{2\}attains lower post\-training match, yet is correctly classified because the winning category and map\-field assignment are favorable\.
Therefore, it is possible to have
Mj1\(I\(𝒙1\)\)\>Mj2\(I\(𝒙2\)\),f\(𝒙1\)≠y1,f\(𝒙2\)=y2\.M\_\{j\_\{1\}\}\(I\(\\bm\{x\}\_\{1\}\)\)\>M\_\{j\_\{2\}\}\(I\(\\bm\{x\}\_\{2\}\)\),\\qquad f\(\\bm\{x\}\_\{1\}\)\\neq y\_\{1\},\\qquad f\(\\bm\{x\}\_\{2\}\)=y\_\{2\}\.
Combining Steps 1–3 yields the claim of Proposition 1: after selective adversarial training, post\-training match need not remain a monotone proxy for correctness\. Consequently, match\-threshold rejection rules calibrated on vanilla models need not remain valid after selective adversarial training\.□\\square
## Appendix BDerived Unconditional Robustness Tables
The main robustness results in Tables[1](https://arxiv.org/html/2605.06902#S7.T1)–[2](https://arxiv.org/html/2605.06902#S7.T2)of the main paper are reported on the clean\-correct subset\. To provide an additional reference view, we derive unconditional robust accuracy and unconditional AURAC directly from the reported clean accuracy and conditional robustness values\.
Let
CondRob\(ϵ\)=P\(f\(𝒙ϵadv\)=y∣f\(𝒙\)=y\)\\mathrm\{CondRob\}\(\\epsilon\)=P\\\!\\left\(f\(\\bm\{x\}^\{\\mathrm\{adv\}\}\_\{\\epsilon\}\)=y\\mid f\(\\bm\{x\}\)=y\\right\)\(36\)denote the conditional robust accuracy at perturbation levelϵ\\epsilon, and let
CleanAcc=P\(f\(𝒙\)=y\)\\mathrm\{CleanAcc\}=P\(f\(\\bm\{x\}\)=y\)\(37\)denote the clean accuracy\. Then the corresponding unconditional robust accuracy is
UncondRob\(ϵ\)\\displaystyle\\mathrm\{UncondRob\}\(\\epsilon\)=P\(f\(𝒙\)=y,f\(𝒙ϵadv\)=y\)\\displaystyle=P\\\!\\big\(f\(\\bm\{x\}\)=y,\\ f\(\\bm\{x\}^\{\\mathrm\{adv\}\}\_\{\\epsilon\}\)=y\\big\)\(38\)=CleanAcc×CondRob\(ϵ\)\.\\displaystyle=\\mathrm\{CleanAcc\}\\times\\mathrm\{CondRob\}\(\\epsilon\)\.
Accordingly, for the reported point metric atϵ=0\.30\\epsilon=0\.30,
Uncond\.Adv@0\.30=CleanAcc×Adv@0\.30100,\\mathrm\{Uncond\.Adv@0\.30\}=\\frac\{\\mathrm\{CleanAcc\}\\times\\mathrm\{Adv@0\.30\}\}\{100\},\(39\)where all quantities are expressed in percent\.
Similarly, since the reported AURAC is computed from the robust\-accuracy curve overϵ∈\[0\.05,0\.35\]\\epsilon\\in\[0\.05,0\.35\], the corresponding unconditional AURAC is derived as
Uncond\.AURAC=CleanAcc×AURAC100\.\\mathrm\{Uncond\.AURAC\}=\\frac\{\\mathrm\{CleanAcc\}\\times\\mathrm\{AURAC\}\}\{100\}\.\(40\)
The tables below are therefore auxiliary derived indicators computed from the reported means in Tables[1](https://arxiv.org/html/2605.06902#S7.T1)–[2](https://arxiv.org/html/2605.06902#S7.T2)of the main paper\. They should be interpreted as a complementary unconditional view rather than re\-evaluated experimental measurements\.
Table 10:Derived Unconditional White\-Box RobustnessDerived from Table[1](https://arxiv.org/html/2605.06902#S7.T1)of the main paper usingUncond\.Adv@0\.30=Clean×Adv@0\.30/100\\mathrm\{Uncond\.Adv@0\.30\}=\\mathrm\{Clean\}\\times\\mathrm\{Adv@0\.30\}/100andUncond\.AURAC=Clean×AURAC/100\\mathrm\{Uncond\.AURAC\}=\\mathrm\{Clean\}\\times\\mathrm\{AURAC\}/100\. Values are computed from reported means and are therefore auxiliary derived indicators rather than re\-evaluated experimental measurements\. Bold = best per dataset\.
Table 11:Derived Unconditional Black\-Box Transfer RobustnessDerived from the clean accuracy and conditional robustness values reported in Table[2](https://arxiv.org/html/2605.06902#S7.T2)of the main paper usingUncond\.Adv@0\.30=Clean×Adv@0\.30/100\\mathrm\{Uncond\.Adv@0\.30\}=\\mathrm\{Clean\}\\times\\mathrm\{Adv@0\.30\}/100andUncond\.AURAC=Clean×AURAC/100\\mathrm\{Uncond\.AURAC\}=\\mathrm\{Clean\}\\times\\mathrm\{AURAC\}/100\. Values are computed from reported means and are therefore auxiliary derived indicators rather than re\-evaluated experimental measurements\. Bold = best per dataset\.
Tables[10](https://arxiv.org/html/2605.06902#A2.T10)–[11](https://arxiv.org/html/2605.06902#A2.T11)indicate that the main qualitative conclusions of the paper are preserved under this derived unconditional view\. Under white\-box evaluation, two\-stage selective training is the strongest overall replay\-free defense, achieving the best mean unconditional AURAC on all four datasets and the best mean unconditional robustness atϵ=0\.30\\epsilon=0\.30on MNIST, Fashion\-MNIST, and EMNIST\-Letters; on USPS, online adversarial training \(AdvTrain \(on\)\) achieves the best mean unconditional robustness atϵ=0\.30\\epsilon=0\.30, statistically tied with separation\-aware training\. Under black\-box transfer evaluation, the same protocol\-dependent ranking behavior remains visible: offline adversarial training is strongest on USPS and Fashion\-MNIST, two\-stage selective training is strongest on MNIST, and vanilla is strongest on EMNIST\-Letters\. These auxiliary tables therefore support the claim that the main ranking conclusions are not merely artifacts of conditioning on clean\-correct samples\.Similar Articles
Testing robustness against unforeseen adversaries
OpenAI researchers developed a method to evaluate neural network robustness against unforeseen adversarial attacks, introducing a new metric called UAR (Unforeseen Attack Robustness) that assesses model performance against unanticipated distortion types beyond the commonly studied Lp norms.
Robust adversarial inputs
Researchers demonstrated adversarial images that reliably fool neural network classifiers across multiple scales and perspectives, challenging assumptions about the robustness of multi-scale image capture systems used in autonomous vehicles.
Transfer of adversarial robustness between perturbation types
Researchers study how adversarial robustness transfers across different perturbation types in deep neural networks, evaluating 32 attacks of 5 types on ImageNet models. Results show that robustness to one perturbation type doesn't always transfer to others and may sometimes hurt robustness elsewhere.
TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness
Introduces TASER, a training-time regularization framework derived from Langevin Stein operators that encourages geometric compatibility between predictors and data density, improving adversarial robustness and stability on CIFAR-10 without significant clean accuracy degradation.
Adversarial attacks on neural network policies
OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.