Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning

arXiv cs.CL Papers

Summary

This paper proposes SDBN, a framework combining adversarial training with parameter-efficient fine-tuning to improve robustness of foundation models under noise and limited data, demonstrating substantial improvements in low-resource settings.

arXiv:2606.10610v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited training data. We propose SDBN (Small Data Big Noise), a unified framework that brings adversarial training to PEFT - a combination that remains less studied in the PEFT setting despite its complementary strengths - to enhance model robustness and generalization, outperforming alternative approaches. We also introduce two variants of the method that use discrete uncertainty sets: SDBN-h, which enumerates character-level edits and selects worst-case variants using gradients, and SDBN-p, which uses LLM-generated variants for robust optimization in generative tasks. Experiments across multiple benchmarks reveal substantial improvements, particularly in low-resource settings and under both word-level and character-level corruptions. This framework addresses the less explored intersection of adversarial training and parameter-efficient adaptation, without introducing additional parameters or only modest computational overhead, making PEFT deployments more reliable in real-world scenarios where data scarcity and linguistic variability often coexist
Original Article
View Cached Full Text

Cached at: 06/10/26, 06:11 AM

# Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning
Source: [https://arxiv.org/html/2606.10610](https://arxiv.org/html/2606.10610)
Eitan Cohen Bar\-Ilan University Ramat Gan, Israel ceitan001@gmail\.com&Idan Simai Bar\-Ilan University Ramat Gan, Israel idansi98@gmail\.com&Uri Shaham Bar\-Ilan University Ramat Gan, Israel uri\.shaham@biu\.ac\.il

###### Аннотация

Parameter\-Efficient Fine\-Tuning \(PEFT\) has become essential for adapting foundation models to downstream NLP tasks\. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited training data\. We proposeSDBN\(Small Data Big Noise\), a unified framework that brings adversarial training to PEFT \- a combination that remains less studied in the PEFT setting despite its complementary strengths \- to enhance model robustness and generalization, outperforming alternative approaches\. We also introduce two variants of the method that use discrete uncertainty sets:SDBN\-h, which enumerates character\-level edits and selects worst\-case variants using gradients, andSDBN\-p, which uses LLM\-generated variants for robust optimization in generative tasks\. Experiments across multiple benchmarks reveal substantial improvements, particularly in low\-resource settings and under both word\-level and character\-level corruptions\. This framework addresses the less explored intersection of adversarial training and parameter\-efficient adaptation, without introducing additional parameters or only modest computational overhead, making PEFT deployments more reliable in real\-world scenarios where data scarcity and linguistic variability often coexist\.

Small Data, Big Noise: Adversarial Training for Robust Parameter\-Efficient Fine\-Tuning

Eitan CohenBar\-Ilan UniversityRamat Gan, Israelceitan001@gmail\.comIdan SimaiBar\-Ilan UniversityRamat Gan, Israelidansi98@gmail\.comUri ShahamBar\-Ilan UniversityRamat Gan, Israeluri\.shaham@biu\.ac\.il

## 1Introduction

Parameter\-Efficient Fine\-Tuning \(PEFT\) has recently emerged as a promising strategy for adapting Large Language Models \(LLMs\) to downstream tasks, while substantially reducing both computational and storage requirements\. Notable PEFT approaches include AdapterHoulsbyet al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib10)\), BitFitBen\-Zakenet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib9)\), and Low\-Rank Adaptation \(LoRA\)Huet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib1)\)\. In particular, LoRA has garnered significant attention, spurring the development of several variantsRenet al\.\([2024](https://arxiv.org/html/2606.10610#bib.bib2)\),Zhanget al\.\([2023](https://arxiv.org/html/2606.10610#bib.bib3)\),Dettmerset al\.\([2023](https://arxiv.org/html/2606.10610#bib.bib4)\)which further extend its applicability\.

![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/Swap_Robust_Small_Data.png)Рис\. 1:Performance degradation of PEFT methods on limited training data\.Accuracy of three common PEFT methods: Adapter, BitFit, and LoRA–on a test set with*word‑swapping*noise as the size of thecleanBanking77 training data corpus is reduced \(x‑axis, number of examples\)\. All methods exhibit a marked drop in noisy‑test accuracy as the available training datadwindles, in some cases losing more than 50% below 1,000 training samples\. The trend underscores the current vulnerability of PEFT models to realistic textual perturbations in low‑resource settings\.While these techniques substantially reduce the computational overhead required for fine\-tuning foundation models, they frequently underperform when confronted with domain shifts and noisy data \(see[fig\.˜1](https://arxiv.org/html/2606.10610#S1.F1)\) \- conditions that commonly appear in practical NLP deployments\. Real\-world text often contains imperfections such as typos, inconsistent formatting, and dialectal variations that can significantly degrade model performance\. Additionally, while PEFT approaches have demonstrated impressive results on large\-scale training corpora, their effectiveness may deteriorate significantly when training samples are limited \- a common scenario in domains such as medical data, aerospace, extinct languages, and similar fields\. Notably,Ben\-Zakenet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib9)\)shows that PEFT methods can be more efficient than full fine\-tuning on small datasets, motivating a focused investigation of PEFT approaches tailored specifically for settings with limited data\. These challenges–robustness to noise and domain shifts, as well as worse performance in low\-resource settings–remain largely underexplored in current PEFT methodologies\.

Motivated by these challenges, we propose SDBN \(Small Data Big Noise\)111Code:[https://github\.com/shaham\-lab/SDBN](https://github.com/shaham-lab/SDBN), a unified framework that integrates adversarial training principles into PEFT to enhance robustness and generalization capabilities, including robustness to tokenization\-breaking*character\-level*corruptions in addition to word\-level noise\. While adversarial training has been widely adopted in various contextsZhuet al\.\([2020](https://arxiv.org/html/2606.10610#bib.bib7)\); Jianget al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib23)\); Jiet al\.\([2024](https://arxiv.org/html/2606.10610#bib.bib5)\)–primarily for defense against attacks–its application to improving PEFT methods in low\-resource, noisy settings remains unexplored\. The proposed SDBN framework improves model performance under both noisy conditions \(word\- and character\-level\) and limited training data scenarios\. Beyond continuous embedding\-space perturbations, we introduce two complementary strategies for constructing discrete uncertainty sets: SDBN\-h, which employs gradient\-guided character\-level edits to address tokenization\-breaking noise, and SDBN\-p, which leverages LLM\-generated adversarial variants for richer semantic perturbations particularly suited to generative tasks\. These discrete uncertainty\-set instantiations replace norm ball uncertainty sets in regimes it cannot cover \- tokenization\-breaking character edits \(SDBN\-h\) and semantic, generation\-oriented variants \(SDBN\-p\)\. Unlike most PEFT approaches \- which do not explicitly target scenarios with restricted data or address domain shifts \- SDBN achieves substantial improvements across various benchmark datasets\. Notably, this framework provides robustness not only to known perturbations but also to unanticipated domain shifts that may emerge during deployment, without requiring domain\-specific adaptation data\. Importantly, SDBN maintains the parameter efficiency of existing PEFT methods without adding trainable parameters or extra GPU memory overhead, offering a practical solution for deploying robust language models in resource\-constrained and noisy real\-world environments\.

Our Contributions are as follows: First, We proposeSDBN, a framework that brings adversarial training to PEFT \- a combination onless studied in the PEFT setting\- to improve robustness in low\-resource settings\. Second, we introduce two variants of the method utilizing discrete uncertainty sets:SDBN\-hbased on character\-level perturbation, andSDBN\-pbased on LLM\-generated perturbations\. Third, we demonstratesubstantial robustness gainsacross multiple benchmarks under word\-level and character\-level corruptions, without additional parameters\.

## 2Related work

#### PEFT methods\.

Recent advances in PEFT methods\-such as LoRA, Adapter, BitFit, PromptLiuet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib21)\)and PrefixLi and Liang \([2021](https://arxiv.org/html/2606.10610#bib.bib22)\)–have facilitated the efficient adaptation of pre\-trained models by fine\-tuning only a limited set of parameters\. In particular, LoRA employs low\-rank updates, Adapter incorporates trainable bottleneck modules within each layer, and BitFit restricts modifications to bias parameters\. Notably, recent evaluations demonstrate that PEFT methods like LoRA actually exhibit superior robustness to textual noise and adversarial perturbations compared to full fine\-tuningMore \([2025](https://arxiv.org/html/2606.10610#bib.bib45)\)\. However, even with this inherent advantage, these models remain vulnerable to realistic corruptions in data\-scarce regimes, providing a strong foundation for the targeted robust\-optimization mechanisms introduced in SDBN\. While these approaches reduce parameter counts, most standard PEFT implementations were not originally designed with robustness to noisy inputsKimet al\.\([2024](https://arxiv.org/html/2606.10610#bib.bib11)\)or data\-limited scenarios as primary optimization targets\. Our work explores how integrating adversarial training techniques with existing PEFT methods can address these limitations, particularly in low\-resource settings\.

#### Robust optimization\.

Robust optimization is widely recognized as an effective strategy for enhancing model stability in the presence of noisy data and domain shifts\.Shahamet al\.\([2018](https://arxiv.org/html/2606.10610#bib.bib8)\),Madryet al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib12)\)demonstrated that robust optimization can significantly improve a model’s generalization capabilities on noisy data and downstream tasks, using neural networks; however, the robust optimization approaches discussed in their works, which directly inject noise into raw input data, require specific adaptations for the NLP domain and were not evaluated within PEFT frameworks\.Kimet al\.\([2024](https://arxiv.org/html/2606.10610#bib.bib11)\)applied robust optimization to address noisy labels but did not consider noisy input data\. Moreover, neither approach focused on optimization under conditions of limited data\. In contrast, the integration of adversarial training with PEFT frameworks explored in this work addresses both noisy inputs and small datasets simultaneously\.

#### Adversarial Training\.

Using adversarial training is one way to achieve robust optimization\. FreeLBZhuet al\.\([2020](https://arxiv.org/html/2606.10610#bib.bib7)\), SMARTJianget al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib23)\)and VATMiyatoet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib24)\)applied adversarial training to LLMs, primarily focusing on improving generalization capabilities; however, those do not incorporate PEFT methodologies, which can render full fine\-tuning impractical in certain scenarios\. Recent methods such as LoFTFuet al\.\([2024](https://arxiv.org/html/2606.10610#bib.bib6)\)and AdvLoRAJiet al\.\([2024](https://arxiv.org/html/2606.10610#bib.bib5)\)apply adversarial training with LoRA on vision tasks \(in the case of AdvLoRA, mainly on the visual component of vision\-language tasks\), leaving their applicability in NLP domains unexplored\. Notably, the effectiveness of adversarial training with PEFT methods in scenarios with small datasets and noisy textual inputs remains largely uninvestigated\. This work examines how adversarial training can be applied to PEFT paradigms in NLP and demonstrates enhanced performance over conventional PEFT baselines, particularly in low\-resource settings\.

#### PEFT in NLP under low resources\.

PEFT is now the de‑facto strategy for adapting large language models to downstream tasks, yet its behaviour under noisy or limited data remains understudied\. NEFtuneJainet al\.\([2023](https://arxiv.org/html/2606.10610#bib.bib19)\)shows that injecting*uniform random*noise into token embeddings during instruction tuning \(NEFTune\) markedly improves*average‑case*performance, focusing primarily on enhancing model generalization, but they do not analyse worst‑case perturbations or data‑scarce regimes\. In contrast, this work applies*adversarial*training to PEFT with a different objective: adding adversarial perturbations in the embedding space and optimising adapters to minimise the resulting worst‑case loss\. This targeted approach yields robustness to input noise and domain shift specifically in scenarios with small, noisy datasets–precisely the conditions where PEFT methods are most practically attractive but often struggle without additional robustness mechanisms\.

#### Robustness through data noising\.

Several works have shown that even small character edits \(e\.g\., missing letters\) can cause degradation in NLP models, since such changes often break tokenization and push inputs far in embedding space\. EDAWei and Zou \([2019](https://arxiv.org/html/2606.10610#bib.bib36)\)proposed simple data augmentation editing actions, showing improvements on small datasets; however, these perturbations are applied randomly rather than guided by gradients\. HotFlipEbrahimiet al\.\([2018](https://arxiv.org/html/2606.10610#bib.bib34)\)introduced gradient\-guided character\-level edits, but as an attack tool\. WildNLPRychalskaet al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib32)\)andAepli and Sennrich \([2022](https://arxiv.org/html/2606.10610#bib.bib35)\)explored noise\-augmented training to improve robustness to character corruptions, but without leveraging adversarial signals to identify worst\-case perturbations\. In contrast, our work adapts character\-level perturbations within an adversarial training framework, using gradient guidance to maximize their effectiveness for improving robustness in low\-resource PEFT settings\.

#### Discrete uncertainty sets\.

Several prior works use discrete perturbation sets in NLP, but in settings that differ substantially from ours\. Some focus on certified robustness via verification\-style pipelines such as randomized masking or interval bound propagation, e\.g\.,\(Zenget al\.,[2023](https://arxiv.org/html/2606.10610#bib.bib40)\),\(Huanget al\.,[2019](https://arxiv.org/html/2606.10610#bib.bib41)\), and\(Jiaet al\.,[2019](https://arxiv.org/html/2606.10610#bib.bib42)\)\. These methods target formal guarantees against bounded substitution attacks rather than training\-time robust optimization for PEFT, and are not naturally scalable to large generative models or practical PEFT training loops\. Other work, such as\(Zhouet al\.,[2021](https://arxiv.org/html/2606.10610#bib.bib43)\), proposes an inference\-time defense against synonym substitution attacks rather than adversarial training\. Closest in spirit is\(Ivgi and Berant,[2021](https://arxiv.org/html/2606.10610#bib.bib44)\), which studies discrete adversarial training for classification; however, it assumes full fine\-tuning, focuses on attack\-style substitutions rather than practical small\-data noise, and does not consider PEFT or generative tasks\. In contrast, our work studies PEFT as the adaptation regime, uses discrete uncertainty sets within a unified training\-time robust optimization framework, and evaluates them across multiple PEFT methods in low\-resource noisy settings, including generative tasks\.

## 3Background on robust optimization

Robust optimization\(Ben\-Talet al\.,[2009](https://arxiv.org/html/2606.10610#bib.bib14)\)is a field in optimization theory that aims to improve model stability under input uncertainty\. Consider a model with parametersθ\\thetaand a dataset𝒟\\mathcal\{D\}of input\-label pairs\(x,y\)\(x,y\), wherex∈ℝdx\\in\\mathbb\{R\}^\{d\}andy∈\{1,…,K\}y\\in\\\{1,\\ldots,K\\\}\. For each inputxx\(in our context\- as a sentence\), we define an*uncertainty set*𝒮x⊆ℝd\\mathcal\{S\}\_\{x\}\\subseteq\\mathbb\{R\}^\{d\}: which reflects the level of uncertainty in the input\- the set of sentences that arise from small, semantics‑preserving edits toxx\(e\.g\., token deletions, swaps, or character flips\); these variants capture exactly the inputs on which the model’s classification is uncertain\. Given a loss functionℒ\\mathcal\{L\}, the robust optimization objective is:

minθ⁡𝔼\(x,y\)∼𝒟​\[maxx\+δ∈𝒮x⁡ℒ​\(x\+δ;θ,y\)\]\.\\min\_\{\\theta\}\\,\\mathbb\{E\}\_\{\(x,y\)\\sim\\mathcal\{D\}\}\\\!\\Bigl\[\\max\_\{x\+\\delta\\in\\mathcal\{S\}\_\{x\}\}\\mathcal\{L\}\(x\+\\delta;\\theta,y\)\\Bigr\]\.
When holdingθ\\thetaandyyfixed and viewingℒ​\(x;θ,y\)\\mathcal\{L\}\(x;\\theta,y\)as a function ofxxwe occasionally writeℒθ,y​\(x\)\\mathcal\{L\}\_\{\\theta,y\}\(x\)\. The inner maximization problem aims to find a worst\-case example of a givenxxin the uncertainty set that achieves the highest loss\. Using a first\-order Taylor approximation around the inputxx, we can express the loss on the perturbed example as:

ℒθ,y​\(x\+δ\)≈ℒθ,y​\(x\)\+⟨∇ℒθ,y​\(x\),δ⟩\.\\mathcal\{L\}\_\{\\theta,y\}\(x\+\\delta\)\\approx\\mathcal\{L\}\_\{\\theta,y\}\(x\)\+\\langle\\nabla\\mathcal\{L\}\_\{\\theta,y\}\(x\),\\delta\\rangle\.\(1\)The optimal perturbationδ∗\\delta^\{\*\}for each training example is then:

δ∗=arg⁡maxδ:x\+δ∈Sx⁡⟨∇ℒθ,y​\(x\),δ⟩\.\\delta^\{\*\}=\\arg\\max\_\{\\delta:x\+\\delta\\in S\_\{x\}\}\\langle\\nabla\\mathcal\{L\}\_\{\\theta,y\}\(x\),\\delta\\rangle\.\(2\)One way to define𝒮x\\mathcal\{S\}\_\{x\}is by a norm ball centered atxxwith a small radiusϵ\\epsilon:

𝒮x=\{x\+δ∈ℝd:‖δ‖p≤ϵ\}\.\\mathcal\{S\}\_\{x\}=\\\{x\+\\delta\\in\\mathbb\{R\}^\{d\}:\\\|\\delta\\\|\_\{p\}\\leq\\epsilon\\\}\.\(3\)[Equation˜1](https://arxiv.org/html/2606.10610#S3.E1)explains why the perturbationδ~\\tilde\{\\delta\}222Details on how the choice ofppshapesδ∗\\delta^\{\*\}\(e\.g\.,ℓ∞\\ell\_\{\\infty\},ℓ2\\ell\_\{2\},ℓ1\\ell\_\{1\}\) and its connections to standard procedures like FGSM are deferred to Appendix[B](https://arxiv.org/html/2606.10610#A2)\.increases the loss compared to the original input\. For small perturbation \(with smallϵ\\epsilon\) the perturbedxxis positively correlated with the direction of the gradientg=∇ℒθ,y​\(x\)g=\\nabla\\mathcal\{L\}\_\{\\theta,y\}\(x\)\(i\.e\., their angle is less than90∘90^\{\\circ\}\), making the inner product⟨g,δ~⟩\\langle g,\\tilde\{\\delta\}\\ranglepositive and thus increasing the loss\. This strategy yields more challenging training examples and can improve robustness to domain shifts and noisy inputs\.

Adversarial TrainingGoodfellowet al\.\([2015](https://arxiv.org/html/2606.10610#bib.bib13)\)stands as a prominent defense strategy for enhancing model robustness against attacks\. It uses adversarial examples, which are perturbed versions of the original points\. In the context of robust optimization, this approach leverages approximated worst\-case examples which define an uncertainty setSxS\_\{x\}around each inputxx, as detailed in Section[3](https://arxiv.org/html/2606.10610#S3)\.

## 4Methodology

In this section, we describe how the proposed SDBN framework integrates adversarial training techniques with PEFT methods to address the challenges of noise robustness and domain shifts under limited data resources\. While these techniques have been previously explored for full model fine\-tuning, our focus is on demonstrating their particular value when applied to PEFT methods in low\-resource, noisy settings\. Conceptually, SDBN is a single robust\-optimization framework instantiated with three uncertainty sets: \(i\) standardℓ∞\\ell\_\{\\infty\}ball, \(ii\) discrete tokenization\-breaking character edits \(SDBN\-h\), and \(iii\) discrete LLM\-generated adversarial variants \(SDBN\-p\)\. We detail each variant in the following subsections\.

![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/TREC_Noise_Variants_un.png)\(a\)Character/lexical edits \(TREC\)
![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/tsne_domain_shift_un.png)\(b\)Domain Shift \(Y​e​l​p→Yelp\\\!\\rightarrow\\\!Amazon\)

Рис\. 2:Perturbation regimes addressed by Adversarial Training\.\(a\) PCA visualization of embedding space for TREC sentences, showing how character/word perturbations \(triangles/squares\) \- fall within uncertainty regions \(ellipses\) around clean sentences \(circles\)\. This illustrates how our adversarial training approach effectively can improve the model’s performance on diverse linguistic variations that occur within these uncertainty regions\. Importantly, even when specific noisy examples are not encountered during training, the model develops robustness to these perturbations because they reside in the adversarially expanded uncertainty regions\. \(b\) t\-SNE plot of embeddings from Amazon and Yelp sentences, demonstrating a domain shift scenario where the uncertainty regions \(ellipses\) created around samples from the source domain \(Yelp\) encompass examples from the unseen target domain \(Amazon\)\. This visualization supports adversarial training key finding: by optimizing for worst\-case examples within the uncertainty set during training, adversarial training generalizes effectively to novel domains without explicitly training on them\.### 4\.1Motivation

We address two practical robustness challenges encountered in real\-world NLP deployments, particularly in the prevalent low\-resource scenarios where limited training data makes these issues even more severe\. The first concerns fine\-grained textual perturbations that occur naturally in user\-generated content but are often absent from clean training data\. These include various character and word\-level modifications that preserve semantic meaning while changing the visible text structure\. The second involves domain shifts that occur when deployment environments differ subtly from training conditions in style or topic distribution\. Unlike traditional domain adaptation scenarios where target domain data is available, we aim to build resilience against unanticipated shifts without prior exposure to the target domain\. The data scarcity compounds these challenges, as models trained on small datasets tend to overfit and lack the broad exposure needed to generalize well to variations\. Our objective is to enhance PEFT methods to withstand these everyday linguistic variations even when trained on limited data, maintaining high performance on clean inputs while creating more reliable models for practical applications\.

### 4\.2Rationale

Motivated byShahamet al\.\([2018](https://arxiv.org/html/2606.10610#bib.bib8)\)andMadryet al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib12)\), who demonstrate that adversarial training functions as a robust optimization procedure, the integration of PEFT with robust optimization through adversarial training addresses challenges specific to low\-resource settings\. As illustrated in Figure[2](https://arxiv.org/html/2606.10610#S4.F2), we can conceptualize both character/lexical variations and domain shifts as points within uncertainty regions surrounding clean examples\. The PCA visualization in Figure[2](https://arxiv.org/html/2606.10610#S4.F2)\(a\) demonstrates how character and word\-level perturbations fall within uncertainty regions surrounding their clean sentence counterparts\. Similarly, Figure[2](https://arxiv.org/html/2606.10610#S4.F2)\(b\) shows how uncertainty regions created around samples from the source domain \(Yelp\) extend to cover examples from unseen target domain \(Amazon\)\.

Adversarial training optimizes robustness by training on*worst\-case*perturbations within the uncertainty region\. This forces stability and yields generalization to both noisy inputs and unseen domains without explicit exposure, allowing the model to handle linguistic variations beyond the training set\.

Limited data scenarios\.PEFT can outperform full fine\-tuning on small datasetsBen\-Zakenet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib9)\); Puet al\.\([2023](https://arxiv.org/html/2606.10610#bib.bib26)\), but limited corpora lack diversity in corruptions, dialects, and domain\-specific terminology, making models*sensitive*to noise and domain shift\. Adversarial training mitigates this by generating gradient\-aligned*worst\-case*neighbors around each example and optimizing performance*within the existing uncertainty set*\. Training on these hard variants builds robustness to unseen linguistic variations and domain differences—without additional labeled data\. This intuition is also consistent with our empirical findings in results[5](https://arxiv.org/html/2606.10610#S5.SS0.SSS0.Px2), where adversarial training yields substantially larger gains for PEFT methods than for full fine\-tuning in the same low\-resource setting under noisy conditions\.

Theoretical advantages of gradient\-based perturbations over random noise\.The effectiveness of adversarial training compared to random noise injection can be formally understood through the lens of high\-dimensional geometry and its effect on the optimization landscape\. While both approaches introduce perturbations during training, their impact on model robustness differs fundamentally\.

As established in[eq\.˜1](https://arxiv.org/html/2606.10610#S3.E1), the change in loss when applying a perturbationδ\\deltato the input is approximated by⟨∇ℒθ,y​\(x\),δ⟩\\langle\\nabla\\mathcal\{L\}\_\{\\theta,y\}\(x\),\\delta\\rangle\. In high\-dimensional embedding spaces, random noise vectors \(as used in methods likeNEFTuneand DAEVincentet al\.\([2008](https://arxiv.org/html/2606.10610#bib.bib25)\)\) exhibit a critical limitation: they are approximately orthogonal to any fixed direction, including the gradient\. This is a well\-established property in high\-dimensional spaces as explained byMiyatoet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib24)\), where random vectors become nearly perpendicular with high probability as dimensionality increases\. Consequently, for random perturbationsδrandom\\delta\_\{\\text\{random\}\}with constraint‖δrandom‖p≤ε\\\|\\delta\_\{\\text\{random\}\}\\\|\_\{p\}\\leq\\varepsilon, the expected inner product𝔼​\[⟨∇ℒθ,y​\(x\),δrandom⟩\]≈0\\mathbb\{E\}\[\\langle\\nabla\\mathcal\{L\}\_\{\\theta,y\}\(x\),\\delta\_\{\\text\{random\}\}\\rangle\]\\approx 0\. This results in minimal consistent effect on the loss, creating only weak, undirected regularization that fails to target the model’s specific vulnerabilities\.

In contrast, adversarial perturbations as defined in[eq\.˜3](https://arxiv.org/html/2606.10610#S3.E3)explicitly maximize the inner product with the gradient\. The optimal perturbationδ⋆\\delta^\{\\star\}from[eq\.˜1](https://arxiv.org/html/2606.10610#S3.E1)ensures the largest possible first\-order increase in loss within the constraint\. This targeted approach creates worst\-case examples that probe precisely the directions where the model is most sensitive, compelling optimization to flatten the loss landscape exactly where it is steepest\. While random noise merely induces general smoothing across all directions, adversarial training systematically expands decision margins in the regions that most require robustness, producing models that generalize effectively to both natural perturbations and domain shifts as visualized in Figure[2](https://arxiv.org/html/2606.10610#S4.F2)\.

### 4\.3SDBN: PEFT with Adversarial Training

In attempting to achieve robustness to noise using adversarial training, it is impossible to create adversarial examples by adding numeric perturbation to text symbols as in vision tasks \(examining[eq\.˜3](https://arxiv.org/html/2606.10610#S3.E3), an incompatibility exists in the expressionx\+δx\+\\delta, wherexxrepresents a sequence of discrete symbols whileδ\\deltadenotes a continuous numeric perturbation\)\. Following established approaches in NLP adversarial trainingZhuet al\.\([2020](https://arxiv.org/html/2606.10610#bib.bib7)\),Miyatoet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib24)\),Jianget al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib23)\), noise injection at the embedding layer is utilized rather than perturbing the raw input data\. LetE​\(⋅\)E\(\\cdot\)be the embedding extractor andf​\(⋅;θ\)f\(\\cdot;\\theta\)be the subsequent layers of the integrated model with any PEFT method \(e\.g\., LoRA\) andθ\\thetadenote the model’s parameters\. For an input batch\(X,Y\)\(X,Y\), we compute the embeddings𝐞=E​\(X\)\\mathbf\{e\}=E\(X\)and the clean loss:ℒclean=ℒ​\(fθ​\(𝐞\),Y\)\.\\mathcal\{L\}\_\{\\text\{clean\}\}=\\mathcal\{L\}\\bigl\(f\_\{\\theta\}\(\\mathbf\{e\}\),Y\\bigr\)\.We then compute its gradient𝐠=∇𝐞ℒclean\\mathbf\{g\}=\\nabla\_\{\\mathbf\{e\}\}\\mathcal\{L\}\_\{\\text\{clean\}\}and form a perturbation:δ=ϵ⋅sign⁡\(𝐠\),\\delta=\\epsilon\\cdot\\operatorname\{sign\}\(\\mathbf\{g\}\),yielding adversarial embeddings𝐞adv=𝐞\+δ\\mathbf\{e\}\_\{\\text\{adv\}\}=\\mathbf\{e\}\+\\deltaand corresponding lossℒadv=ℒ​\(fθ​\(𝐞adv\),Y\)\\mathcal\{L\}\_\{\\text\{adv\}\}=\\mathcal\{L\}\\bigl\(f\_\{\\theta\}\(\\mathbf\{e\}\_\{\\text\{adv\}\}\),Y\\bigr\)\. Finally, we update the trainable parametersθ\\thetaaccording to the specific PEFT method we use byℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}\(for implementation details, see[section˜D\.7](https://arxiv.org/html/2606.10610#A4.SS7)\)\.

Within the SDBN framework, this perturbation can be selected from any norm ball\. Empirically, choosing from theℓ∞\\ell\_\{\\infty\}norm ball yields the best performance and is used for the experiments in[section˜5](https://arxiv.org/html/2606.10610#S5)\. For a detailed comparison of perturbations drawn fromℓ1\\ell\_\{1\},ℓ2\\ell\_\{2\}, andℓ∞\\ell\_\{\\infty\}norm balls, see[section˜C\.6](https://arxiv.org/html/2606.10610#A3.SS6)\. For further empirical motivation and visual intuition showing whyℓ∞\\ell\_\{\\infty\}withϵ=10−4\\epsilon=10^\{\-4\}best captures realistic noise patterns in embedding space, see the detailed analysis in[section˜C\.7](https://arxiv.org/html/2606.10610#A3.SS7)\. For the description of epsilon selection and the pseudo\-code, see[algorithms˜1](https://arxiv.org/html/2606.10610#alg1)and[C\.1](https://arxiv.org/html/2606.10610#A3.SS1)\.

### 4\.4Character–level noise as a challenge

As[fig\.˜2\(a\)](https://arxiv.org/html/2606.10610#S4.F2.sf1)indicates, most*word*\-level edits \(squares\) stay within the uncertainty region around clean sentences \(circles\), whereas many*character*\-level edits \(triangles\) fall outside it\. Edits that*break a token*\- e\.g\., deleting a letter \-alter tokenization \(splits/unk\) and push embeddings far from regions seen in training\. By contrast, case changes typically preserve both tokenization \(the word remains a single token\) and semantics, so the resulting embedding stays close to the original\. Appendix[D\.1](https://arxiv.org/html/2606.10610#A4.SS1)provides concrete examples of this distance gap\. Hence, robustness to character\-level noise is more challenging and may require complementary tools alongside embedding\-space adversarial training\.

Hybrid Strategy: SDBN\-h\.To address character\-level perturbations that break tokenization and produce embeddings far outside theℓp\\ell\_\{p\}\-ball used in SDBN, we define a discrete uncertainty set𝒮x\\mathcal\{S\}\_\{x\}as all single\-character variants of a sentencexx\. Since𝒮x\\mathcal\{S\}\_\{x\}is finite, projected gradient descent cannot be applied directly\. Instead, given the gradientg=∇eℒ​\(fθ​\(e\),y\)g=\\nabla\_\{e\}\\mathcal\{L\}\(f\_\{\\theta\}\(e\),y\)from the clean embeddinge=E​\(x\)e=E\(x\), we selectz∗z^\{\*\}to bearg⁡maxz∈𝒮x⁡⟨g,E​\(z\)−E​\(x\)⟩\\arg\\max\_\{z\\in\\mathcal\{S\}\_\{x\}\}\\langle g,E\(z\)\-E\(x\)\\rangleand use it as the adversarial example\. Meaning, we use the gradient for selecting the perturbation but not for creating the perturbation\. Each mini\-batch is split: one subset is perturbed via standardℓ∞\\ell\_\{\\infty\}FGSM in embedding space, and the other viaz⋆z^\{\\star\}, reusing the same gradient\. This yields robustness to both continuous embedding perturbations and discrete character distortions with negligible overhead\. For more details and full pseudo\-code, see[section˜C\.2](https://arxiv.org/html/2606.10610#A3.SS2)\. SDBN\-h extends embedding\-space adversarial training to tokenization\-discrete character corruptions by performing a discrete worst\-case selection using the same gradient signal\.

### 4\.5Prompt\-Based Uncertainty Sets: SDBN\-p

We have empirically found that the continuous embedding\-space uncertainty set in SDBN is less effective for generative tasks\. To tackle this, we construct an alternative discrete uncertainty set by leveraging an LLM to generate semantic\-preserving adversarial variants\. Given a training examplexx, we prompt an LLM to generatekksemantically\-equivalent variants that include realistic perturbations such as paraphrases, typos, and style variations\. Formally, we defineSxprompt=\{z1,…,zk\}S\_\{x\}^\{\\text\{prompt\}\}=\\\{z\_\{1\},\\ldots,z\_\{k\}\\\}, where eachziz\_\{i\}is generated by prompting an LLM withxxand instructions to produce adversarial variants\.

Unlike SDBN and SDBN\-h, where perturbations are small enough to be guided by clean\-input gradients via Taylor approximation, the variants inSxpromptS\_\{x\}^\{\\text\{prompt\}\}may involve significant structural changes that fall outside the local linear region\. Consequently, for SDBN\-p, we do not use the gradient\-guided selection rule\. Instead, we explicitly compute the loss for each of thekkpre\-computed variants and select the one that maximizes the training objective:

z⋆=arg⁡maxz∈Sxprompt⁡ℒ​\(f​\(E​\(z\);θ\),y\)z^\{\\star\}=\\arg\\max\_\{z\\in S\_\{x\}^\{\\text\{prompt\}\}\}\\mathcal\{L\}\(f\(E\(z\);\\theta\),y\)\(4\)While this requireskkforward passes, it ensures we identify the true worst\-case semantic variant for robust optimization in generative tasks\. This approach naturally captures linguistic variations that may be difficult to enumerate algorithmically, such as paraphrases and contextual rewrites, making it particularly suitable for generative tasks where output diversity is important\. SDBN\-p uses an LLM\-generated discrete uncertainty set instead of a norm\-ball uncertainty set\. For more details, examples, and pseudo\-code, see[section˜C\.3](https://arxiv.org/html/2606.10610#A3.SS3)\.

## 5Results

Experimental Setup\.We evaluate the proposedSDBNframework using two pre\-trained models: BERT\-baseDevlinet al\.\([2019](https://arxiv.org/html/2606.10610#bib.bib28)\)and DeBERTa\-v3Heet al\.\([2023](https://arxiv.org/html/2606.10610#bib.bib29)\)across multiple classification benchmarks: sentences datasets including20Newsgroups\(Lang,[1995](https://arxiv.org/html/2606.10610#bib.bib17)\),Banking77\(Casanuevaet al\.,[2020](https://arxiv.org/html/2606.10610#bib.bib15)\),TREC\(Singhalet al\.,[1999](https://arxiv.org/html/2606.10610#bib.bib16)\), andIMDB, as well as the word\-pair semantic relation classification datasetBLESS\(Baroni and Lenci,[2011](https://arxiv.org/html/2606.10610#bib.bib31)\)\. We additionally evaluate on generative tasks acrossSQuAD\(Rajpurkaret al\.,[2016](https://arxiv.org/html/2606.10610#bib.bib30)\)andTweetQA\(Xionget al\.,[2019](https://arxiv.org/html/2606.10610#bib.bib46)\)datasets, using LLaMA\-3\.2\-1B\(Llama Team,[2024](https://arxiv.org/html/2606.10610#bib.bib38)\), LLaMA\-2\-7B\(Touvronet al\.,[2023](https://arxiv.org/html/2606.10610#bib.bib48)\),and Qwen\-2\.5\-7B\(Qwen Team,[2025](https://arxiv.org/html/2606.10610#bib.bib47)\)to assess robustness in generation\-oriented tasks beyond classification\. For SDBN\-p, we use GPT\-5\.2\(OpenAI,[2025](https://arxiv.org/html/2606.10610#bib.bib37)\)to generate adversarial variants\. For PEFT methods, we use:Adapter,BitFit,LoRA, andQLoRA\. We use SDBN \(ℓ∞\\ell\_\{\\infty\}uncertainty set\) as the default, SDBN\-h for tokenization\-breaking character noise, and SDBN\-p for generative tasks via precomputed LLM variant sets\.

Our training protocol consists of 3 warm\-up epochs with standard training followed by 10–20 epochs of either SDBN or baseline training without perturbations\. We also compare to NEFTune \(which was originally presented with QLoRA\) and EDA \(see implementation details in[D\.7](https://arxiv.org/html/2606.10610#A4.SS7.SSS0.Px1)\)methods integrated with these PEFT approaches\.

To simulate real\-world scenarios, we evaluate under challenging conditions:Data scarcity:using 5% to 100% of the original training data\.Input noise:applying word and char\-level noise \(for noise types details see[fig\.˜6](https://arxiv.org/html/2606.10610#A3.F6)\) at test time\.

The remainder of our experimental evaluation follows a systematic progression: We assess performance on clean test data across varying training set sizes to establish baseline effectiveness and robustness under word/character corruptions to demonstrate the advantages of the SDBN framework in noisy environments\. We then examine cross\-domain generalization using the ArSarcasm\-v2Abu Farhaet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib27)\)dataset and an NLI setting \(cross\-genre\) to test performance under domain shifts\. Due to space limits, all domain\-shift results are reported in the Appendix \(see[14](https://arxiv.org/html/2606.10610#A4.F14)\)\. Across these domain\-shift evaluations, our adversarially trained PEFT approach \(SDBN\) maintains consistent gains over baselines\. Finally, we conduct targeted analyses of the SDBN’s components, comparing different perturbation strategies and examining the impact of perturbation location within the model architecture, and reporting resource costs such as runtime and memory footprint \(see[section˜D\.5](https://arxiv.org/html/2606.10610#A4.SS5)\)\.

Results on clean data\.We evaluate models trained with standard PEFT methods alongside their SDBN variants on clean test sets\. Our experiments demonstrate that SDBN consistently improves accuracy over vanilla PEFT methods across all datasets with limited training data\. This confirms our theoretical analysis from Section[4\.2](https://arxiv.org/html/2606.10610#S4.SS2)that SDBN enhances generalization on clean data, not just under noisy conditions\. Figure[3](https://arxiv.org/html/2606.10610#S5.F3)illustrates the relative improvements onBanking77andTREC, revealing a critical insight: the benefits of SDBN become increasingly significant as training data size decreases\. This validates our hypothesis that robust optimization techniques are particularly valuable in low\-resource scenarios, where models typically struggle with generalization\. Full results across all data scales and datasets are in[7](https://arxiv.org/html/2606.10610#A4.T7)\.

![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/bank_trec_scales.png)Рис\. 3:Adversarial Training Effect on Limited Data Scenarios\.Relative performance improvement of adversarial training over baseline PEFT methods \(e\.g\., 10% means SDBN\-LoRA/LoRA = 1\.10\) across training sizes onBanking77andTREC\. Adversarial training not only maintains but often improves clean\-data performance, with gains increasing as the training set shrinks\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/lora_noise_VNS.png)Рис\. 4:Adversarial training for variable\-intensity noise\.Performance comparison of LoRA PEFT implementations under variable\-intensity noise with DeBERTa\-v3 on 1,000Banking77samples\. SDBN shows superior robustness in low\-resource settings\.Results on Noisy Data\.We evaluate model robustness to various linguistic perturbations using a 1,000\-sample subset ofBanking77with DeBERTa\-v3, focusing on low\-resource scenarios\. As illustrated in Figure[4](https://arxiv.org/html/2606.10610#S5.F4)for variable\-intensity noise, where we vary the perturbation amplitude from 1\-5 operations per sentence \(e\.g\., deleting one word vs\. five words\), SDBN maintains consistent advantage over both vanilla PEFT and NEFTune\. The performance gap remains stable as corruption severity increases, showing SDBN’s superior ability to handle progressively more distorted inputs\. Results for constant\-intensity noise, where each sentence is corrupted by exactly one operation, show similar improvements and are provided in Appendix[D\.2](https://arxiv.org/html/2606.10610#A4.SS2)\.

These results validate our theoretical framework from[section˜4\.2](https://arxiv.org/html/2606.10610#S4.SS2), where we represented linguistic variations as points within uncertainty regions surrounding clean examples\. By optimizing for worst\-case examples within these regions through gradient\-based perturbations, SDBN effectively improves the model’s robustness to real\-world text variations, as visualized in[fig\.˜2\(a\)](https://arxiv.org/html/2606.10610#S4.F2.sf1)\. In contrast, NEFTune’s random noise approach, while helpful, lacks the targeted nature of adversarial training\. SDBN’s systematic exploration of uncertainty regions enables more effective generalization to diverse perturbations without explicit exposure during training, making it particularly valuable for low\-resource scenarios where the limited training data lacks natural linguistic diversity\.

Таблица 1:BLESS: accuracy \(%\) under character\-level noise\.DeBERTa\-v3 \+ LoRA trained on 1,000 clean samples\. SDBN\-h yields the best robustness on tokenization\-breaking types\.\(a\)Clean, Delete\-char, Swap\-char\(b\)Double\-char
Character Noise \(SDBN\-h\)\.While SDBN improves robustness to a wide range of perturbations, its embedding\-spaceℓp\\ell\_\{p\}\-ball constraint cannot capture*tokenization\-breaking*character edits, where a single change sends the resulting embedding far outside the continuous uncertainty region\. To address this,SDBN\-haugments standard FGSM perturbations with gradient\-guided adversarial examples drawn from a discrete uncertainty set of single\-character variants\.

OnBLESSwith DeBERTa\-v3 \+ LoRA trained on 1,000 clean samples, SDBN\-h yields the best robustness on tokenization\-breaking types \(see[table˜1](https://arxiv.org/html/2606.10610#S5.T1)\), improving by about \+4–7% while matching clean accuracy\. We include a qualitative robustness example illustrating tokenization\-breaking character noise and SDBN\-h behavior in Appendix[D\.6](https://arxiv.org/html/2606.10610#A4.SS6)\.

Таблица 2:Generative task results under clean and noisy evaluation\.Top:SQuAD exact match \(EM\) using LLaMA\-3\.2\-1B with LoRA, trained on 200 samples\.Bottom:TweetQA F1 using LLaMA\-2\-7B with LoRA, trained on 200 samples\. SDBN\-p uses generated adversarial variants during training\. Best results in each column are shown inbold; second\-best results areunderlined\.#### Generative Tasks\.

Our initial continuous\-perturbation variant was less effective on generative tasks \(Appendix[D\.4](https://arxiv.org/html/2606.10610#A4.SS4)\), motivating SDBN\-p\. We therefore evaluate on two generative benchmarks: SQuAD with LLaMA\-3\.2\-1B and TweetQA with LLaMA\-2\-7B\. Table[2](https://arxiv.org/html/2606.10610#S5.T2)shows that SDBN\-p improves performance over all baselines on both tasks, under both clean and noisy evaluation\. Additional generative results and details are provided in Appendix[D\.4](https://arxiv.org/html/2606.10610#A4.SS4)\.

Таблица 3:SDBN is substantially more effective with PEFT than with full fine\-tuning\.The table reports the absolute accuracy gain \(percentage points\) achieved by adding SDBN to each training method on a low\-resource subset of Banking77 \(using DeBERTa\-v3\)\. For example, while SDBN improves LoRA by 23\.6\(%\) on clean data, it only improves full fine\-tuning by 1\.3\(%\)\. This demonstrates that the adversarial signal is significantly more impactful within the constrained parameter subspaces of PEFT methods\.
#### PEFT vs\. full fine\-tuning under adversarial training\.

We find that adversarial training is more effective when combined with PEFT than with full fine\-tuning in low\-resource settings\. Table[3](https://arxiv.org/html/2606.10610#S5.T3)compares the gain of SDBN over the corresponding vanilla regime on a small subset ofBanking77with DeBERTa\-v3, evaluated on clean data and under three word\-level corruptions\. The gains are consistently substantial for PEFT methods, but negligible for full fine\-tuning\. This asymmetry suggests that PEFT’s constrained parameter space may make the adversarial signal more focused and effective when data is scarce and noisy, whereas full fine\-tuning is more vulnerable to overfitting adversarial examples in its much larger parameter number\.

## 6Conclusion

We present SDBN \(Small Data Big Noise\), integrating adversarial training with PEFT to improve robustness in low\-resource settings\. Experiments show significant gains in robustness and accuracy across datasets and PEFT variants\. We further extend the framework with two complementary strategies for discrete uncertainty set construction: SDBN\-h, which incorporates gradient\-guided character\-level perturbations to address token\-breaking typos, and SDBN\-p, which leverages LLM\-generated adversarial variants for generative tasks\. The framework preserves PEFT’s parameter efficiency while improving robustness to word\-level and character\-level noise and to domain shift, without requiring additional training data\.

## 7Limitations

While our method substantially improves robustness to both word\-level and character\-level perturbations, several limitations remain\. Generating adversarial embeddings requires an additional forward–backward pass per mini\-batch\. Although this overhead does not increase GPU memory usage, it does raise per\-batch runtime and may become a bottleneck when scaling to very large models or training corpora\. Moreover, the choice of perturbation radiusϵ\\epsilonremains sensitive across datasets\. While our analysis provides a principled default, more adaptive or automated tuning strategies could further enhance robustness and ease adoption\.

## Список литературы

- Overview of the wanlp 2021 shared task on sarcasm and sentiment detection in arabic\.InProceedings of the Sixth Arabic Natural Language Processing Workshop,Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p4.1)\.
- N\. Aepli and R\. Sennrich \(2022\)Improving zero\-shot cross\-lingual transfer between closely related languages by injecting character\-level noise\.InFindings of the Association for Computational Linguistics: ACL 2022,pp\. 4074–4083\.External Links:[Link](https://aclanthology.org/2022.findings-acl.321/)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px5.p1.1)\.
- M\. Baroni and A\. Lenci \(2011\)How we BLESSed distributional semantic evaluation\.InProceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP 2011,Edinburgh, Scotland, UK,pp\. 1–10\.External Links:[Link](http://clic.cimec.unitn.it/distsem)Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- A\. Ben\-Tal, A\. Nemirovski, and L\. El Ghaoui \(2009\)Robust optimization\.Vol\.2,Princeton University Press,Princeton, NJ\.Cited by:[§3](https://arxiv.org/html/2606.10610#S3.p1.9)\.
- E\. Ben\-Zaken, S\. Ravfogel, and Y\. Goldberg \(2021\)BitFit: Simple Parameter\-efficient Fine\-tuning for Transformer\-based Masked Language\-models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(ACL\),Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p1.1),[§1](https://arxiv.org/html/2606.10610#S1.p2.1),[§4\.2](https://arxiv.org/html/2606.10610#S4.SS2.p3.1)\.
- S\. Boyd and L\. Vandenberghe \(2004\)Convex optimization\.Cambridge University Press\.Cited by:[§C\.1](https://arxiv.org/html/2606.10610#A3.SS1.p1.11)\.
- I\. Casanueva, T\. Temčinas, D\. Gerz, M\. Henderson, and I\. Vulić \(2020\)Efficient intent detection with dual sentence encoders\.arXiv preprint arXiv:2003\.04807\.External Links:[Link](http://arxiv.org/abs/2003.04807)Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- T\. Dettmers, A\. Pagnoni, A\. Holtzman, and L\. Zettlemoyer \(2023\)QLORA: Efficient Finetuning of Quantized LLMs\.arXiv preprint arXiv:2305\.14314\.External Links:[Link](https://arxiv.org/abs/2305.14314)Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p1.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: pre\-training of deep bidirectional transformers for language understanding\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),pp\. 4171–4186\.Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- J\. Ebrahimi, A\. Rao, D\. Lowd, and D\. Dou \(2018\)HotFlip: white\-box adversarial examples for text classification\.InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics \(ACL\),External Links:[Link](https://aclanthology.org/P18-2006)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px5.p1.1)\.
- J\. Fu, J\. Fang, J\. Sun, S\. Zhuang, L\. Geng, and Y\. Liu \(2024\)LoFT: LoRA\-Based Efficient and Robust Fine\-Tuning Framework for Adversarial Training\.In2024 International Joint Conference on Neural Networks \(IJCNN\),pp\. 1–8\.External Links:[Document](https://dx.doi.org/10.1109/IJCNN60899.2024.10651480)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px3.p1.1)\.
- I\. J\. Goodfellow, J\. Shlens, and C\. Szegedy \(2015\)Explaining and harnessing adversarial examples\.InInternational Conference on Learning Representations \(ICLR\),San Diego, CA, USA\.External Links:[Link](https://arxiv.org/abs/1412.6572)Cited by:[Приложение B](https://arxiv.org/html/2606.10610#A2.p1.18),[§3](https://arxiv.org/html/2606.10610#S3.p4.2)\.
- P\. He, J\. Gao, and W\. Chen \(2023\)DeBERTaV3: improving DeBERTa using electra\-style pre\-training with gradient\-disentangled embedding sharing\.InProceedings of the International Conference on Learning Representations \(ICLR\),Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- N\. Houlsby, A\. Giurgiu, S\. Jastrzebski, B\. Morrone, Q\. de Laroussilhe, A\. Gesmundo, M\. Attariyan, and S\. Gelly \(2019\)Parameter\-Efficient Transfer Learning for NLP\.InProceedings of the 36th International Conference on Machine Learning \(ICML\),Vol\.97,pp\. 2790–2799\.Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p1.1)\.
- E\. Hu, Y\. Shen, P\. Wallis, Z\. Allen\-Zhu, Y\. Li, S\. Wang, L\. Wang, and W\. Chen \(2021\)LoRA: low\-rank adaptation of large language models\.arXiv preprint arXiv:2106\.09685\.External Links:[Link](https://arxiv.org/abs/2106.09685)Cited by:[§A\.1](https://arxiv.org/html/2606.10610#A1.SS1.p1.2),[§1](https://arxiv.org/html/2606.10610#S1.p1.1)\.
- P\. Huang, R\. Stanforth, J\. Welbl, C\. Dyer, D\. Yogatama, S\. Gowal, K\. Dvijotham, and P\. Kohli \(2019\)Achieving verified robustness to symbol substitutions via interval bound propagation\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),pp\. 4083–4093\.External Links:[Link](https://aclanthology.org/D19-1419.pdf)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px6.p1.1)\.
- M\. Ivgi and J\. Berant \(2021\)Achieving model robustness through discrete adversarial training\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,pp\. 1529–1544\.External Links:[Link](https://aclanthology.org/2021.emnlp-main.115.pdf)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px6.p1.1)\.
- N\. Jain, P\. Chiang, Y\. Wen, J\. Kirchenbauer, H\. Chu, G\. Somepalli, B\. R\. Bartoldson, B\. Kailkhura, A\. Schwarzschild, A\. Saha, M\. Goldblum, J\. Geiping, and T\. Goldstein \(2023\)NEFTune: noisy embeddings improve instruction finetuning\.InPreprint,External Links:[Link](https://arxiv.org/abs/2310.05914)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px4.p1.1)\.
- Y\. Ji, Y\. Liu, Z\. Zhang, Z\. Zhang, Y\. Zhao, G\. Zhou, X\. Zhang, X\. Liu, and X\. Zheng \(2024\)AdvLoRA: Adversarial Low\-Rank Adaptation of Vision\-Language Models\.arXiv preprint arXiv:2404\.13425\.External Links:[Link](https://arxiv.org/abs/2404.13425)Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p3.1),[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px3.p1.1)\.
- R\. Jia, A\. Raghunathan, K\. G\. Gülcehre, and P\. Liang \(2019\)Certified robustness to adversarial word substitutions\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),pp\. 4129–4142\.External Links:[Link](https://aclanthology.org/D19-1423.pdf)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px6.p1.1)\.
- H\. Jiang, P\. He, W\. Chen, X\. Liu, J\. Gao, and T\. Zhao \(2019\)SMART: robust and efficient fine‑tuning for pre‑trained natural language models through principled regularized optimization\.arXiv preprint arXiv:1911\.03437v5\.External Links:[Link](https://arxiv.org/abs/1911.03437v5)Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p3.1),[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px3.p1.1),[§4\.3](https://arxiv.org/html/2606.10610#S4.SS3.p1.15)\.
- Y\. Kim, J\. Kim, and S\. Lee \(2024\)Towards Robust and Generalized Parameter\-Efficient Fine\-Tuning for Noisy Label Learning\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(ACL\),pp\. 5922–5936\.Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px1.p1.1),[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px2.p1.1)\.
- K\. Lang \(1995\)Newsweeder: Learning to filter netnews\.InProceedings of the Twelfth International Conference on Machine Learning,pp\. 331–339\.Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- X\. L\. Li and P\. Liang \(2021\)Prefix\-tuning: optimizing continuous prompts for generation\.arXiv preprint arXiv:2101\.00190\.External Links:[Link](https://arxiv.org/abs/2101.00190)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px1.p1.1)\.
- X\. Liu, K\. Ji, Y\. Fu, W\. L\. Tam, Z\. Du, Z\. Yang, and J\. Tang \(2021\)P\-tuning v2: prompt tuning can be comparable to fine\-tuning universally across scales and tasks\.arXiv preprint arXiv:2110\.07602\.Note:Version 3External Links:[Link](https://arxiv.org/abs/2110.07602)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px1.p1.1)\.
- Llama Team \(2024\)The Llama 3 herd of models\.arXiv preprint arXiv:2407\.21783\.External Links:[Link](https://arxiv.org/abs/2407.21783)Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- A\. Madry, A\. Makelov, L\. Schmidt, D\. Tsipras, and A\. Vladu \(2019\)Towards deep learning models resistant to adversarial attacks\.arXiv preprint arXiv:1706\.06083\.External Links:[Link](https://arxiv.org/abs/1706.06083)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px2.p1.1),[§4\.2](https://arxiv.org/html/2606.10610#S4.SS2.p1.1)\.
- T\. Miyato, A\. M\. Dai, and I\. Goodfellow \(2021\)Adversarial Training Methods for Semi\-Supervised Text Classification\.arXiv preprint arXiv:1605\.07725v4\.External Links:[Link](https://arxiv.org/abs/1605.07725v4)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px3.p1.1),[§4\.2](https://arxiv.org/html/2606.10610#S4.SS2.p5.5),[§4\.3](https://arxiv.org/html/2606.10610#S4.SS3.p1.15)\.
- A\. More \(2025\)Investigating the robustness of parameter efficient fine tuning methods against adversarial attacks in natural language processing\.Master’s Thesis,Purdue University,Fort Wayne, Indiana\.Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px1.p1.1)\.
- OpenAI \(2025\)Update to GPT\-5 system card: GPT\-5\.2\.Note:Technical report \(PDF\)External Links:[Link](https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf)Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- G\. Pu, A\. Jain, J\. Yin, and R\. Kaplan \(2023\)Empirical analysis of the strengths and weaknesses of peft techniques for llms\.InProceedings of the Workshop on Understanding Foundation Models at ICLR,External Links:[Link](https://arxiv.org/abs/2304.14999)Cited by:[§4\.2](https://arxiv.org/html/2606.10610#S4.SS2.p3.1)\.
- Qwen Team \(2025\)Qwen2\.5 technical report\.arXiv preprint arXiv:2412\.15115\.External Links:2412\.15115Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- P\. Rajpurkar, J\. Zhang, K\. Lopyrev, and P\. Liang \(2016\)SQuAD: 100,000\+ questions for machine comprehension of text\.arXiv preprint arXiv:1606\.05250\.External Links:[Link](https://arxiv.org/abs/1606.05250)Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- P\. Ren, C\. Shi, S\. Wu, M\. Zhang, Z\. Ren, M\. de Rijke, Z\. Chen, and J\. Pei \(2024\)MELoRA: Mini\-Ensemble Low\-Rank Adapters for Parameter\-Efficient Fine\-Tuning\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),Bangkok, Thailand,pp\. 3052–3064\.External Links:[Link](https://aclanthology.org/2024.acl-long.168/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.168)Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p1.1)\.
- B\. Rychalska, D\. Basaj, A\. Gosiewska, and P\. Biecek \(2019\)Models in the wild: on corruption robustness of neural nlp systems\.InNeural Information Processing \(ICONIP 2019\),Lecture Notes in Computer Science, Vol\.11955,pp\. 235–247\.External Links:[Link](https://link.springer.com/chapter/10.1007/978-3-030-36718-3_20),[Document](https://dx.doi.org/10.1007/978-3-030-36718-3%5F20)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px5.p1.1)\.
- U\. Shaham, Y\. Yamada, and S\. Negahban \(2018\)Understanding adversarial training: Increasing local stability of supervised models through robust optimization\.Neurocomputing307,pp\. 195–204\.External Links:ISSN 0925\-2312,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.neucom.2018.04.027),[Link](https://www.sciencedirect.com/science/article/pii/S0925231218304557)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px2.p1.1),[§4\.2](https://arxiv.org/html/2606.10610#S4.SS2.p1.1)\.
- A\. Singhal, S\. Abney, M\. Bacchiani, M\. Collins, D\. Hindle, and F\. Pereira \(1999\)AT&T at TREC\-8\.InProceedings of the Eighth Text REtrieval Conference \(TREC\-8\),External Links:[Link](https://trec.nist.gov/pubs/trec8/papers/att8.qa.pdf)Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- H\. Touvron, L\. Martin, K\. Stone, P\. Albert, A\. Almahairi, Y\. Babaei, N\. Bashlykov, S\. Batra, P\. Bhargava, S\. Bhosale, D\. Bikel, L\. Blecher, C\. Canton Ferrer, M\. Chen, G\. Cucurull, D\. Esiobu, J\. Fernandes, J\. Fu, W\. Fu, B\. Fuller, C\. Gao, V\. Goswami, N\. Goyal, A\. Hartshorn, S\. Hosseini, R\. Hou, H\. Inan, M\. Kardas, V\. Kerkez, M\. Khabsa, I\. Kloumann, A\. Korenev, P\. S\. Koura, M\. Lachaux, T\. Lavril, J\. Lee, D\. Liskovich, Y\. Lu, Y\. Mao, X\. Martinet, T\. Mihaylov, P\. Mishra, I\. Molybog, Y\. Nie, A\. Poulton, J\. Reizenstein, R\. Rungta, K\. Saladi, A\. Schelten, R\. Silva, E\. M\. Smith, R\. Subramanian, X\. E\. Tan, B\. Tang, R\. Taylor, A\. Williams, J\. Xiang, P\. Xu, Z\. Yan, I\. Zarov, Y\. Zhang, A\. Fan, M\. Kambadur, S\. Narang, A\. Rodriguez, R\. Stojnic, S\. Edunov, and T\. Scialom \(2023\)Llama 2: open foundation and fine\-tuned chat models\.arXiv preprint arXiv:2307\.09288\.External Links:2307\.09288Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- P\. Vincent, H\. Larochelle, Y\. Bengio, and P\. Manzagol \(2008\)Extracting and composing robust features with denoising autoencoders\.InProceedings of the 25th International Conference on Machine Learning \(ICML 2008\),pp\. 1096–1103\.External Links:[Link](https://dl.acm.org/doi/10.1145/1390156.1390294)Cited by:[§4\.2](https://arxiv.org/html/2606.10610#S4.SS2.p5.5)\.
- J\. Wei and K\. Zou \(2019\)EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks\.arXiv preprint arXiv:1901\.11196\.External Links:[Link](https://arxiv.org/abs/1901.11196)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px5.p1.1)\.
- W\. Xiong, J\. Wu, H\. Wang, V\. Kulkarni, M\. Yu, S\. Chang, X\. Guo, and W\. Y\. Wang \(2019\)TWEETQA: a social media focused question answering dataset\.InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics,Florence, Italy,pp\. 5020–5031\.Cited by:[§5](https://arxiv.org/html/2606.10610#S5.p1.1)\.
- J\. Zeng, J\. Xu, X\. Zheng, and X\. Huang \(2023\)Certified robustness to text adversarial attacks by randomized \[mask\]\.Computational Linguistics49\(2\),pp\. 395–429\.External Links:[Link](https://aclanthology.org/2023.cl-2.5.pdf)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px6.p1.1)\.
- Q\. Zhang, M\. Chen, A\. Bukharin, N\. Karampatziakis, P\. He, Y\. Cheng, W\. Chen, and T\. Zhao \(2023\)AdaLoRA: Adaptive Budget Allocation for Parameter\-Efficient Fine\-Tuning\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://arxiv.org/abs/2303.10512)Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p1.1)\.
- Y\. Zhou, X\. Zheng, C\. Hsieh, K\. Chang, and X\. Huang \(2021\)Defense against synonym substitution\-based adversarial attacks via dirichlet neighborhood ensemble\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing \(Volume 1: Long Papers\),pp\. 5482–5492\.External Links:[Link](https://aclanthology.org/2021.acl-long.426.pdf)Cited by:[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px6.p1.1)\.
- C\. Zhu, Y\. Cheng, Z\. Gan, S\. Sun, T\. Goldstein, and J\. Liu \(2020\)FREELB: Enhanced Adversarial Training for Natural Language Understanding\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://arxiv.org/abs/1909.11764v5)Cited by:[§1](https://arxiv.org/html/2606.10610#S1.p3.1),[§2](https://arxiv.org/html/2606.10610#S2.SS0.SSS0.Px3.p1.1),[§4\.3](https://arxiv.org/html/2606.10610#S4.SS3.p1.15)\.

## Приложение AAdditional Preliminaries

### A\.1Low\-Rank Adaptation \(LoRA\)

LoRAHuet al\.\([2021](https://arxiv.org/html/2606.10610#bib.bib1)\)is a parameter\-efficient fine\-tuning method that adapts large\-scale pre\-trained models by learning low\-rank updates\. Consider a weight matrixW0∈ℝd×kW\_\{0\}\\in\\mathbb\{R\}^\{d\\times k\}in a pre\-trained model\. Instead of directly fine\-tuningW0W\_\{0\}, LoRA modifies it as:

W=W0\+Δ​W,W=W\_\{0\}\+\\Delta W,\\quadwhere​Δ​W=A​B,\\text\{where \}\\Delta W=AB,withA∈ℝd×rA\\in\\mathbb\{R\}^\{d\\times r\}andB∈ℝr×kB\\in\\mathbb\{R\}^\{r\\times k\}, andr≪min⁡\(d,k\)r\\ll\\min\(d,k\)\. This low\-rank factorization significantly reduces the number of trainable parameters while retaining performance\.

## Приложение BGeometry ofpp\-Norm Uncertainty Sets and Perturbation Behavior

Letg=∇ℒθ,y​\(x\)g=\\nabla\\mathcal\{L\}\_\{\\theta,y\}\(x\)\. Different choices of normppdetermine distinct perturbation characteristics\. The optimal perturbationδ∗\\delta^\{\*\}can be approximated by a single steepest ascent step, yieldingδ~\\tilde\{\\delta\}that maximizes the inner product in[eq\.˜2](https://arxiv.org/html/2606.10610#S3.E2)\. Steepest ascent determines the direction of maximal increase by optimizing the inner product with the function’s gradient, subject to the given step size and the specific choice of norm, thereby definingδ~\\tilde\{\\delta\}\. Choosingδ~\\tilde\{\\delta\}fromℓ∞\\ell\_\{\\infty\}ball, generates perturbation where each entry ofxxis modified by the same amount in the direction determined by the sign of the gradientgg, \(δ~=ϵ⋅sign⁡\(g\)\\tilde\{\\delta\}=\\epsilon\\cdot\\operatorname\{sign\}\(g\)\) making it particularly suitable for attacks on vision models where imperceptible changes are crucial \- every pixel is modified by a smallϵ\\epsilon\. A practical adversarial training approach leveraging theℓ∞\\ell\_\{\\infty\}uncertainty set is the Fast Gradient Sign Method \(FGSM\)\(Goodfellowet al\.,[2015](https://arxiv.org/html/2606.10610#bib.bib13)\)which definesδ~\\tilde\{\\delta\}in a similar way\. Forℓ2\\ell\_\{2\}ball, the steepest ascent produces perturbations aligned with the direction ofgg, and choosingδ~\\tilde\{\\delta\}fromℓ1\\ell\_\{1\}ball yields sparse perturbations where only a few entries corresponding to the components ofggwith largest absolute values are modified\.

## Приложение CAddition method

Algorithm 1SDBN– Small Data Big Noise:
Adversarial Training withℓ∞\\ell\_\{\\infty\}1:Input batch

XX, labels

YY, embedding extractor

EE, model

f​\(⋅;θ\)f\(\\cdot;\\theta\), initial

ϵ\\epsilon
2:

𝐞←E​\(X\)\\mathbf\{e\}\\leftarrow E\(X\)
3:

ℒclean←ℒ​\(fθ​\(𝐞\),Y\)\\mathcal\{L\}\_\{\\text\{clean\}\}\\leftarrow\\mathcal\{L\}\(f\_\{\\theta\}\(\\mathbf\{e\}\),\\,Y\)
4:

𝐠←∇𝐞ℒclean\\mathbf\{g\}\\leftarrow\\nabla\_\{\\mathbf\{e\}\}\\,\\mathcal\{L\}\_\{\\text\{clean\}\}
5:

δ←ϵ​sign⁡\(𝐠\)\\delta\\leftarrow\\epsilon\\,\\operatorname\{sign\}\(\\mathbf\{g\}\)
6:

𝐞adv←𝐞\+δ\\mathbf\{e\}\_\{\\text\{adv\}\}\\leftarrow\\mathbf\{e\}\+\\delta
7:

ℒadv←ℒ​\(fθ​\(𝐞adv\),Y\)\\mathcal\{L\}\_\{\\text\{adv\}\}\\leftarrow\\mathcal\{L\}\(f\_\{\\theta\}\(\\mathbf\{e\}\_\{\\text\{adv\}\}\),\\,Y\)
8:Update

θ\\thetausing

ℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}

### C\.1Epsilon Selection\.

To maintainℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}consistently higher thanℒclean\\mathcal\{L\}\_\{\\text\{clean\}\}, we empirically determine an appropriately smallϵ\\epsilonvalue \(see[section˜C\.6](https://arxiv.org/html/2606.10610#A3.SS6)\)\. Our implementation includes an adaptive mechanism with a parameterKKthat adjustsϵ\\epsilon: whenℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}falls belowℒclean\\mathcal\{L\}\_\{\\text\{clean\}\}, we conduct a line\-search\(Boyd and Vandenberghe,[2004](https://arxiv.org/html/2606.10610#bib.bib18)\)based search while iteratively reducingϵ\\epsilonby a factor of 10 for up toKKiterations until the constraintℒadv\>ℒclean\\mathcal\{L\}\_\{\\text\{adv\}\}\>\\mathcal\{L\}\_\{\\text\{clean\}\}is satisfied\. In practice, by choosing the appropriateϵ\\epsilon, the constraint is satisfied using just one iteration \(see[section˜C\.6](https://arxiv.org/html/2606.10610#A3.SS6)\)\.

### C\.2SDBN\-h

During training, each mini\-batch is split into two parts: we apply SDBN \(embedding\-spaceℓ∞\\ell\_\{\\infty\}perturbations\) to the first part and SDBN\-h \(character\-variant selection from𝒞​\(x\)\\mathcal\{C\}\(x\)\) to the remaining part \(see[Algorithm˜2](https://arxiv.org/html/2606.10610#alg2)\)\.

Algorithm 2SDBN\-h: Training Step1:Clean input

xx, label

yy, model

f​\(⋅;θ\)f\(\\cdot;\\theta\), loss

ℒ\\mathcal\{L\}, embedding

E​\(⋅\)E\(\\cdot\)
2:Noise\-function set

ℋ\\mathcal\{H\}, where each

h∈ℋh\\in\\mathcal\{H\}maps

\(x,i\)↦z\(x,i\)\\mapsto zby applying a character\-level edit to

xxat index

ii
3:

g←∇E​\(x\)ℒ​\(f​\(E​\(x\);θ\),y\)g\\leftarrow\\nabla\_\{E\(x\)\}\\,\\mathcal\{L\}\\\!\\left\(f\(E\(x\);\\theta\),\\,y\\right\)⊳\\trianglerightgradient of the clean example

4:Sample

h∼Uniform​\(ℋ\)h\\sim\\textsc\{Uniform\}\(\\mathcal\{H\}\)
5:

𝒞​\(x\)←\{zi=h​\(x,i\):i=1,…,\|x\|\}\\mathcal\{C\}\(x\)\\leftarrow\\\{\\,z\_\{i\}=h\(x,i\)\\ :\\ i=1,\\ldots,\|x\|\\,\\\}⊳\\trianglerightapplyhhat all character indices

6:

z⋆←arg⁡maxz∈𝒞​\(x\)⁡⟨g,E​\(z\)−E​\(x\)⟩z^\{\\star\}\\leftarrow\\arg\\max\_\{z\\in\\mathcal\{C\}\(x\)\}\\left\\langle g,\\ E\(z\)\-E\(x\)\\ \\right\\rangle
7:Update

θ\\thetausing

ℒ​\(f​\(E​\(z⋆\);θ\),y\)\\mathcal\{L\}\\\!\\left\(f\(E\(z^\{\\star\}\);\\theta\),\\,y\\right\)

#### Character perturbation types\.

The discrete uncertainty set𝒞​\(x\)\\mathcal\{C\}\(x\)consists of single\-character variants generated by the following operations:

- •Delete character:Remove one character \(e\.g\., ‘‘card’’→\\rightarrow‘‘crd’’\)
- •Swap characters:Swap two adjacent characters \(e\.g\., ‘‘card’’→\\rightarrow‘‘acrd’’\)
- •Double character:Duplicate one character \(e\.g\., ‘‘card’’→\\rightarrow‘‘carrd’’\)
- •Phonetic replacement:Replace with phonetically similar character \(e\.g\., ‘‘phone’’→\\rightarrow‘‘fone’’\)
- •Insert character:Insert a random character \(e\.g\., ‘‘card’’→\\rightarrow‘‘ca1rd’’\)
- •Cyrillic substitution:Replace with visually similar Cyrillic character \(e\.g\., ‘‘card’’→\\rightarrow‘‘caяd’’\)
- •Random capitalization:Change case of one character \(e\.g\., ‘‘card’’→\\rightarrow‘‘caRd’’\)

For each inputxx, one perturbation type is randomly selected, and all single\-edit variants under that type form𝒮​\(x\)\\mathcal\{S\}\(x\)\.

### C\.3SDBN\-p

Algorithm 3SDBN\-p \- Training Step1:Input

xx, label

yy, model

f​\(⋅;θ\)f\(\\cdot;\\theta\), loss

ℒ\\mathcal\{L\}, embedding

E​\(⋅\)E\(\\cdot\)
2:Pre\-computed variants

𝒫​\(x\)=\{z1,…,zk\}\\mathcal\{P\}\(x\)=\\\{z\_\{1\},\\dots,z\_\{k\}\\\}
3:

z⋆←arg⁡maxzi∈𝒫​\(x\)⁡ℒ​\(f​\(E​\(zi\);θ\),y\)z^\{\\star\}\\leftarrow\\arg\\max\_\{z\_\{i\}\\in\\mathcal\{P\}\(x\)\}\\mathcal\{L\}\\\!\\left\(f\(E\(z\_\{i\}\);\\theta\),\\,y\\right\)⊳\\trianglerightselect variant with maximum explicit loss

4:

ℒp←ℒ​\(f​\(E​\(z⋆\);θ\),y\)\\mathcal\{L\}\_\{p\}\\leftarrow\\mathcal\{L\}\\\!\\left\(f\(E\(z^\{\\star\}\);\\theta\),\\,y\\right\)
5:Update

θ\\thetausing

ℒp\\mathcal\{L\}\_\{p\}

#### LLM\-generated adversarial variants\.

The discrete uncertainty set𝒫​\(x\)\\mathcal\{P\}\(x\)is pre\-computed offline by prompting an LLM \(e\.g\., GPT\-5\.2\) to generatekkadversarial variants for each training example\. The variants introduce noise while preserving the semantic meaning and expected output, aiming to improve general robustness\[cite: 1348\]\. This generation is performed once before training begins\. During training, at each epoch, we explicitly compute the loss for each variant and select the one that maximizes the training objective\. Because these variants can involve structural changes that fall outside the local linear region of the clean input, this explicit selection ensures we identify the true worst\-case semantic neighbor\. Importantly, since model parameters evolve during training, the worst\-case variant may change from epoch to epoch, allowing the model to be exposed to different challenging examples throughout optimization\.

#### Example prompt and output\. \(GPT\-5\.2\)

> Prompt to LLM:Generate 5 adversarial variants of the following input\. Each variant should preserve the meaning and expected output, but challenge the model by increasing the loss size \(cross\-entropy\), so that it will achieve robustness to noise\. Input:‘‘Context: Because of its Catholic identity, a number of religious buildings stand on campus\. \[…\] The Grotto of Our Lady of Lourdes, which was built in 1896, is a replica of the original in Lourdes, France\. \[…\] Question: In what year was the Grotto of Our Lady of Lourdes at Notre Dame constructed? Answer:’’ Expected output:‘‘1896’’ Generated variant \(1 of 5\):‘‘Context: Because of its Catholic identity, a number of religious buildings stand on camtpus\. The Old College building has become oneofMtwo seminaries \[…\] The Grotto of Our Lady of Lourdes, which was built in 1896, is a replica of the original in Lourdes, France\. \[…\] Question: In what yea waIs the Grotto of Our Lady of Lourdes at Notre Dame constructed? Answer:’’

The LLM introduces realistic character\-level corruptions \(e\.g\., ‘‘campus’’→\\rightarrow‘‘camtpus’’, ‘‘year was’’→\\rightarrow‘‘yea waIs’’\) that break tokenization while preserving answerability\.

### C\.4Algorithmℓ2\\ell\_\{2\}

Algorithm 4Adversarial Training withℓ2\\ell\_\{2\}1:Input batch

XX, labels

YY, embedding extractor

EE, model

f​\(⋅;θ\)f\(\\cdot;\\theta\), initial

ϵ\\epsilon
2:

e←E​\(X\)e\\leftarrow E\(X\)
3:

ℒclean←ℒ​\(fθ​\(e\),Y\)\\mathcal\{L\}\_\{\\text\{clean\}\}\\leftarrow\\mathcal\{L\}\(f\_\{\\theta\}\(e\),Y\)
4:

g←∇eℒcleang\\leftarrow\\nabla\_\{e\}\\,\\mathcal\{L\}\_\{\\text\{clean\}\}
5:

δ←ϵ⋅g‖g‖2\\delta\\leftarrow\\epsilon\\cdot\\dfrac\{g\}\{\\\|g\\\|\_\{2\}\}
6:

eadv←e\+δe\_\{\\text\{adv\}\}\\leftarrow e\+\\delta
7:

ℒadv←ℒ​\(fθ​\(eadv\),Y\)\\mathcal\{L\}\_\{\\text\{adv\}\}\\leftarrow\\mathcal\{L\}\(f\_\{\\theta\}\(e\_\{\\text\{adv\}\}\),Y\)
8:Update

θ\\thetausing

ℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}

### C\.5Algorithmℓ1\\ell\_\{1\}

Algorithm 5Adversarial Training withℓ1\\ell\_\{1\}1:Input batch

XX, labels

YY, embedding extractor

EE, model

f​\(⋅;θ\)f\(\\cdot;\\theta\), initial

ϵ\\epsilon
2:

e←E​\(X\)e\\leftarrow E\(X\)
3:

ℒclean←ℒ​\(fθ​\(e\),Y\)\\mathcal\{L\}\_\{\\text\{clean\}\}\\leftarrow\\mathcal\{L\}\(f\_\{\\theta\}\(e\),Y\)
4:

g←∇eℒcleang\\leftarrow\\nabla\_\{e\}\\,\\mathcal\{L\}\_\{\\text\{clean\}\}
5:

i⋆←arg⁡maxi⁡\|gi\|i^\{\\star\}\\leftarrow\\arg\\max\_\{i\}\|g\_\{i\}\|
6:

δ←0\\delta\\leftarrow 0⊳\\trianglerightsame shape asee

7:

δi⋆←ϵ⋅sign⁡\(gi⋆\)\\delta\_\{i^\{\\star\}\}\\leftarrow\\epsilon\\cdot\\operatorname\{sign\}\(g\_\{i^\{\\star\}\}\)⊳\\trianglerightadd perturbation only at the max\-magnitude entry

8:

eadv←e\+δe\_\{\\text\{adv\}\}\\leftarrow e\+\\delta
9:

ℒadv←ℒ​\(fθ​\(eadv\),Y\)\\mathcal\{L\}\_\{\\text\{adv\}\}\\leftarrow\\mathcal\{L\}\(f\_\{\\theta\}\(e\_\{\\text\{adv\}\}\),Y\)
10:Update

θ\\thetausing

ℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}

### C\.6Evaluate Epsilon and Norms

Таблица 4:Evaluation of differentℓp\\ell\_\{p\}norms for varyingϵ\\epsilonvalues onBanking77\(1,000 samples\)\. Based on these results, we choseℓ∞\\ell\_\{\\infty\}withϵ=10−4\\epsilon=10^\{\-4\}as the best option\.\(a\) Largerϵ\\epsilonvalues
\(b\) Smallerϵ\\epsilonvalues

Таблица 5:Perturbation Layer Impact on Model Performance\.Classification accuracy comparison when applying adversarial noise at different layers in BERT\-base with LoRA, trained on 1000 samples from Banking77\. Perturbations at the embedding layer dramatically outperform those at any encoder layer, highlighting that input\-level modifications more effectively capture realistic linguistic variations\.
### C\.7Justification for Usingℓ∞\\ell\_\{\\infty\}Norm withϵ=10−4\\epsilon=10^\{\-4\}

To justify our design choice, we analyzed how realistic textual noise affects sentence embeddings compared to clean data\. Specifically, we took clean sentences and applied multiple noise types \(e\.g\., word deletion, swap, replacement, case changes, character edits\)\. For each noisy sentence, we computed the difference between its embedding vector and that of the corresponding clean sentence\. We then visualized these embedding\-level perturbations in two complementary ways: a histogram of coordinate\-wise differences \(Fig\.[5](https://arxiv.org/html/2606.10610#A3.F5)\) and a heatmap of difference magnitudes across embedding indices and noise types \(Fig\.[6](https://arxiv.org/html/2606.10610#A3.F6)\)\.

The histogram in Fig\.[5](https://arxiv.org/html/2606.10610#A3.F5)shows that the perturbation values are distributed in a narrow range, approximately symmetric around zero, with no heavy tails\. This indicates that noise does not create sparse, extreme deviations, but rather small, bounded shifts across many embedding dimensions\. Similarly, the heatmap in Fig\.[6](https://arxiv.org/html/2606.10610#A3.F6)demonstrates that all noise types induce perturbations of comparable magnitude, uniformly spread across embedding coordinates\. Together, these observations suggest that realistic textual noise behaves like a low\-magnitude, approximately uniform distribution across embedding dimensions\.

These empirical findings motivate the use of theℓ∞\\ell\_\{\\infty\}norm for adversarial perturbations, since theℓ∞\\ell\_\{\\infty\}ball constrains each coordinate to be shifted by the same amount, reflecting the bounded, coordinate\-wise nature of realistic noise\. From our norm comparison study \(Table[4](https://arxiv.org/html/2606.10610#A3.T4)\), we found thatℓ∞\\ell\_\{\\infty\}withϵ=10−4\\epsilon=10^\{\-4\}yielded the best trade\-off: large enough to consistently increase the adversarial lossLadvL\_\{\\text\{adv\}\}over the clean lossLcleanL\_\{\\text\{clean\}\}, yet small enough to preserve semantic proximity to the clean embeddings\. Thus, this choice is both theoretically justified by the geometry of observed perturbations and empirically validated by robustness improvements\.

![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/histt.png)Рис\. 5:Histogram of embedding differences between noisy and clean sentences across multiple noise types\. The values cluster in a narrow, symmetric range around zero, indicating that realistic noise introduces small but bounded shifts across embedding dimensions rather than sparse or extreme deviations\.![Refer to caption](https://arxiv.org/html/2606.10610v1/x1.png)Рис\. 6:Heatmap of coordinate\-wise embedding differences for multiple noise types\. Each row is a noise type; each column is an embedding dimension\. Similar magnitudes across rows indicate roughly uniform low\-magnitude perturbations consistent with anℓ∞\\ell\_\{\\infty\}ball\.Таблица 6:Types of noise perturbations used for robustness evaluation

## Приложение DAdditional Results

### D\.1Token\-Breaking Examples

A*token\-breaking*edit is any character\-level change that forces the tokenizer to split what was a single token into several fragments\. The table below shows three such edits\.

- •A single, well\-trained token is replaced by several rare ones; their embeddings lie far from the original word in representation space\.
- •The longer sequence alters positional patterns, further distancing the whole sentence from its clean counterpart\.
- •Because these new embeddings sit*outside*the uncertainty region sampled during adversarial training, the defence becomes much less effective against such edits\.

These observations explain the large gap between character\-noise points \(triangles\) and word\-level/clean points \(circles, squares\) in[fig\.˜2\(a\)](https://arxiv.org/html/2606.10610#S4.F2.sf1)\.

Таблица 7:Adversarial Training Improves PEFT Across Resource Settings\.Performance comparison of standard PEFT methods vs their adversarial training counterparts \(SDBN\) on text classification tasks using BERT\-base\. Results show accuracy \(%\) when training on varying percentages \(100%, 50%, 20%, 10%, 5%\) of each dataset\. Bold numbers indicate the better performing variant for each method pair\. Adversarial training improves classification performance across all PEFT methods, datasets, and resource settings, with particularly significant gains in low\-resource scenarios\.
### D\.2Performance over noisy test set

Constant\-Intensity Noise Results\.Figure[7](https://arxiv.org/html/2606.10610#A4.F7)presents results for constant\-intensity perturbations, where exactly one operation \(word deletion, swap, replacement, etc\.\) is applied per sentence\. SDBN consistently outperforms both vanilla PEFT and NEFTune across all noise types, with improvements ranging from 3\-8% absolute accuracy\. These single\-operation perturbations represent the minimal corruptions commonly found in real\-world text, demonstrating SDBN’s effectiveness even for subtle linguistic variations\.

![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/bitfit_noise_VNS.png)Рис\. 7:Adversarial training for constant\-intensity noise\.Performance comparison of BitFit PEFT implementations under constant\-intensity noise conditions with DeBERTa\-v3 on 1,000 samples from theBanking77dataset\. These results demonstrate the superior robustness of gradient\-based adversarial training in low\-resource settings\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/noise_powers/adapter.png)Рис\. 8:Adversarial training for variable\-intensity noise, Adapter PEFT\.Performance comparison of Adapter PEFT implementations under variable\-intensity noise conditions in different amplitudes with DeBERTa\-v3 on 1,000 samples from Banking77 dataset\. These results demonstrate the superior robustness of Adversarial training gradient\-based to noise in low\-resource settings\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/noise_powers/bitfit.png)Рис\. 9:Adversarial training for variable\-intensity noise, BitFit PEFT\.Performance comparison of BitFit PEFT implementations under variable\-intensity noise conditions in different amplitudes with DeBERTa\-v3 on 1,000 samples from Banking77 dataset\. These results demonstrate the superior robustness of Adversarial training gradient\-based to noise in low\-resource settings\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/noise_powers/qlora.png)Рис\. 10:Adversarial training for variable\-intensity noise, QLoRA PEFT\.Performance comparison of QLoRA PEFT implementations under variable\-intensity noise conditions in different amplitudes with DeBERTa\-v3 on 1,000 samples from Banking77 dataset\. These results demonstrate the superior robustness of Adversarial training gradient\-based to noise in low\-resource settings\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/noise_constant/lora.png)Рис\. 11:Adversarial training for constant\-intensity noise, LoRA PEFT\.Performance comparison of LoRA PEFT implementations under constant\-intensity noise conditions with DeBERTa\-v3 on 1,000 samples from Banking77 dataset\. These results demonstrate the superior robustness of Adversarial training gradient\-based to noise in low\-resource settings\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/noise_constant/qlora.png)Рис\. 12:Adversarial training for constant\-intensity noise, QLoRA PEFT\.Performance comparison of QLoRA PEFT implementations under constant\-intensity noise conditions with DeBERTa\-v3 on 1,000 samples from Banking77 dataset\. These results demonstrate the superior robustness of Adversarial training gradient\-based to noise in low\-resource settings\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/noise_constant/adapter.png)Рис\. 13:Adversarial training for constant\-intensity noise, Adapter PEFT\.Performance comparison of Adapter PEFT implementations under constant\-intensity noise conditions with DeBERTa\-v3 on 1,000 samples from Banking77 dataset\. These results demonstrate the superior robustness of Adversarial training gradient\-based to noise in low\-resource settings\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/ArSarcasm_Domains_Shifts_with_NEFTune.png)Рис\. 14:Domain Shifts\.Comparison of LoRA PEFT methods on Arabic dialect sentiment classification using multilingual BERT\-base trained on 270 samples of Egyptian Arabic without exposure to the other dialects during training\. SDBN consistently outperforms alternatives across all dialects, with notable advantages on both the source and target domains\.![Refer to caption](https://arxiv.org/html/2606.10610v1/figures/domain_shifts_nli.png)Рис\. 15:NLI Classification Performance: Source vs Target Domains\.Performance comparison of LoRA PEFT methods on NLI classification tasks using DeBERTa\-v3\-large trained on only 77 samples from the source domain \(NLI Fiction\) without exposure to the target domain during training\. Results show mean classification accuracy across 20 random seeds\. SDBN consistently outperforms both vanilla LoRA and NEFTune across both domains, demonstrating superior robustness and generalization in this text classification scenario\. While NEFTune shows improvement over vanilla LoRA through random noise injection, SDBN’s gradient\-based adversarial perturbations provide more robustness, particularly valuable in this extreme low\-resource classification setting\.
### D\.3Domain Shifts\.

Domain Shifts\.In a*low\-resource*setup\- 270 training sentences from a single Arabic dialect and*no exposure to the target dialect*\-we evaluate cross\-domain robustness on the ArSarcasm\-v2 benchmark\. Despite this tiny\-data constraint[fig\.˜14](https://arxiv.org/html/2606.10610#A4.F14), SDBN generalizes better to the unseen dialect than both vanilla and NEFTune\.

This improvement accords with the rationale in[section˜4\.2](https://arxiv.org/html/2606.10610#S4.SS2): gradient\-based perturbations explore uncertainty regions around each source example that extend beyond the source dialect\. The t\-SNE visualization in[fig\.˜2\(b\)](https://arxiv.org/html/2606.10610#S4.F2.sf2)shows these regions overlapping semantically related sentences from the unseen dialect, effectively “covering” the target domain without additional supervision\.

Optimizing on these worst\-case neighbors equips SDBN to handle domain shifts that would otherwise demand costly labeled data–an advantage that becomes particularly decisive when the available corpus is as small as a few hundred examples\.

See[fig\.˜14](https://arxiv.org/html/2606.10610#A4.F14)for further experiments and results on domain shifts\.

### D\.4Generative Tasks

Таблица 8:SQuAD generative QA results \(EM / F1\)\.Models trained on 500 clean SQuAD examples with DeBERTa\-v3 and tested on clean data and noisy variants\. Two noise types at test time:Delete\-CharandDelete\-Word\. Results are averaged over 5 runs\. SDBN outperforms both vanilla PEFT and NEFTune\.Таблица 9:Additional generative results\. TweetQA F1 using LLaMA\-3\.2\-1B with LoRA, trained on 200 samples\. Reported values are mean±\\pmstd over random seeds\. Best results in each column are shown inbold; second\-best results areunderlined\.Таблица 10:Additional generative results\. SQuAD exact match \(EM\) using Qwen\-2\.5\-7B with LoRA, trained on 200 samples\. Reported values are mean±\\pmstd over random seeds\. Best results in each column are shown inbold; second\-best results areunderlined\.ForSQuAD, we adopt a compact adversarial\-training schedule consisting of 1 warm\-up epoch followed by 10 adversarial/method\-specific epochs\. For each training example, we construct 5 variants, and the same fixed set of variants is reused throughout training\.

ForTweetQA, we use a longer and more dynamic training schedule with 5 warm\-up epochs followed by 16 adversarial/method\-specific epochs\. For each training example, we generate 10 variants; unlike SQuAD, these variants are regenerated at every adversarial epoch, so each epoch exposes the model to a new set of perturbations\. This dynamic setup provides more diverse adversarial supervision and broader coverage of linguistic variation across training\.

### D\.5Method Analysis

We analyze the primary implementation choices that underlie the effectiveness of our approach\. First, we compared perturbations constrained by theℓ1\\ell\_\{1\},ℓ2\\ell\_\{2\}, andℓ∞\\ell\_\{\\infty\}norms across a range of magnitudes\. Theℓ∞\\ell\_\{\\infty\}norm withϵ=10−4\\epsilon=10^\{\-4\}produced the best performance over other norms\. Full details of the norm comparison andϵ\\epsilontuning appear in[section˜C\.6](https://arxiv.org/html/2606.10610#A3.SS6)\.

Next, we explored*where*to inject the perturbations along the Transformer stack\. When the noise was added to hidden activations in intermediate encoder layers, accuracy deteriorated sharply\. Restricting the perturbation to theembedding layer, however, preserved the model’s semantic processing and yielded the strongest robustness gains\. Applying perturbations solely at theembedding layerkeeps the noise interpretable as realistic textual edits–such as swapping two words, inserting an extra token, or deleting a character–thereby exposing the model to linguistic variation\. Injecting noise deeper in the encoder, by contrast, corrupts high\-level semantic representations that no valid sentence could produce, which explains the strong performance drop we observe\. Detailed results for every injection point are reported in[table˜4](https://arxiv.org/html/2606.10610#A3.T4)\.

Таблица 11:Runtime per batch \(ms\)\.Mean wall\-clock time \(milliseconds\) on an NVIDIA H200 for a full training step \(forward–backward\) on a batch of 32Banking77sentences across PEFT methods\. Lower is faster\.#### Runtime\.

Table[11](https://arxiv.org/html/2606.10610#A4.T11)indicates that SDBN variants introduce a*bounded*and*predictable*compute overhead relative to vanilla PEFT\. SDBN increases per\-batch latency by about1\.40×1\.40\\times,1\.27×1\.27\\times,1\.33×1\.33\\times, and1\.37×1\.37\\timesfor LoRA, QLoRA, BitFit, and Adapter, respectively \(e\.g\., LoRA:116\.21→163\.05116\.21\\\!\\to\\\!163\.05ms\)\. The hybrid SDBN\-h adds only a modest margin over SDBN—approximately 5–15% across methods \(LoRA:163\.05→170\.71163\.05\\\!\\to\\\!170\.71ms; QLoRA:189\.70→206\.64189\.70\\\!\\to\\\!206\.64ms; BitFit:123\.40→140\.48123\.40\\\!\\to\\\!140\.48ms; Adapter:131\.22→150\.25131\.22\\\!\\to\\\!150\.25ms\)\. SDBN\-p incurs a larger but still bounded overhead—approximately 30–39% over SDBN \(LoRA:163\.05→211\.91163\.05\\\!\\to\\\!211\.91ms; QLoRA:189\.70→247\.98189\.70\\\!\\to\\\!247\.98ms; BitFit:123\.40→171\.87123\.40\\\!\\to\\\!171\.87ms; Adapter:131\.22→178\.85131\.22\\\!\\to\\\!178\.85ms\)\. Overall, the added cost remains practical for deployment: the methods retain parameter efficiency and deliver the robustness gains reported earlier while keeping per\-batch latency within practical bounds\.

Таблица 12:Peak GPU memory usage during training on H200\.Memory Efficiency\.Table[12](https://arxiv.org/html/2606.10610#A4.T12)reports peak GPU memory during training\. All SDBN variants introduce negligible memory overhead \(≤\\leq0\.15%\) compared to vanilla fine\-tuning, confirming that our framework enhances robustness without additional memory cost\. Notably, SDBN\-p requires no extra memory since adversarial variants are pre\-computed offline\.

Таблица 13:A qualitative example\.A DeBERTa\-v3 encoder was fine\-tuned with LoRA on 1,000*clean*Banking77training examples\. We show one clean test example \(“Original”\) and several noisy variants \(word\- and character\-level\)\. Cells report correctness \(✓ = correct,×\\times= incorrect\)\. Adversarial training improves robustness overall \(SDBN\), and its hybrid variant \(SDBN\-h\) is especially effective on character\-level noise that disrupts tokenization\.

### D\.6A Qualitative Example\.

[Table˜13](https://arxiv.org/html/2606.10610#A4.T13)demonstrates that gradient\-based adversarial training \(SDBN\) reliably improves robustness on word\-level edits over Vanilla and NEFTune while preserving clean\-data accuracy\. Consistent with our rationale in[section˜4\.2](https://arxiv.org/html/2606.10610#S4.SS2), optimizing against worst\-case neighbors teaches the model to correctly handle semantically preserving rewrites that lie within the local uncertainty set\. Importantly, the hybrid variant SDBN\-h which augments SDBN with discrete character perturbations—substantially mitigates tokenization\-breaking edits \(character changes\), closing the gap on character\-level noise that defeats both baselines\. Overall, these examples indicate that SDBN strengthens robustness around clean inputs, and SDBN\-h extends this robustness to fine\-grained character corruptions, yielding the most consistent behavior across the noisy variants in[Table˜13](https://arxiv.org/html/2606.10610#A4.T13)\.

### D\.7Setup Details

For all experiments, we utilized the Hugging Facebert\-basemodel as our pre\-trained backbone\. The implementation of Parameter\-Efficient Fine\-Tuning \(PEFT\) methods was conducted using the PEFT library with PyTorch\. We employed the AdamW optimizer with a learning rate of1×10−41\\times 10^\{\-4\}, applied only to trainable parameters\. The LoRA rank was set to 12 for BERT models and 4 for DeBERTa models across both the baseline and our adversarial training method\. The adapter bottleneck dimension was set to 16 for both methods\. Each experimental setting \(dataset\-method\-dataset size\) was run five times with five different random seeds to ensure robustness\. During fine\-tuning, we used: Batch size: 32 Dropout: 0\.1 Evaluation was performed using the TextAttack library to assess model noise robustness\. All experiments were conducted using an NVIDIA GPU\. In[table˜7](https://arxiv.org/html/2606.10610#A4.T7)results show mean accuracy across 5 random seeds\. In[fig\.˜7](https://arxiv.org/html/2606.10610#A4.F7)results show mean accuracy across 10 random seeds without outliers \(excluding the best and worst results for each method to reduce bias when working with small datasets\)\.

#### EDA baseline\.

We use EDA as a lightweight data\-noising baseline\. To keep training compute comparable, we use conservative augmentation rates \(αsr=0\.05\\alpha\_\{\\text\{sr\}\}\{=\}0\.05,αri=0\.05\\alpha\_\{\\text\{ri\}\}\{=\}0\.05,αrs=0\.05\\alpha\_\{\\text\{rs\}\}\{=\}0\.05,prd=0\.05p\_\{\\text\{rd\}\}\{=\}0\.05\) and generate a single augmented example per input \(num\_aug=1\\texttt\{num\\\_aug\}\{=\}1\)\. To match the training compute of other methods \(i\.e\., the same number of optimizer steps\), we do not expand the dataset; instead, in each mini\-batch we replace the clean input with its EDA\-augmented variant with probability0\.50\.5, following the original EDA procedure\. Note that EDA was originally proposed as a general data augmentation method and was not presented in combination with PEFT in its original work\. The augmented dataset is generated once before training and reused across all epochs, following the original EDA offline augmentation procedure

Similar Articles

Information Theoretic Adversarial Training of Large Language Models

arXiv cs.LG

This paper introduces WARDEN, a distributionally robust adversarial training framework for large language models that uses f-divergence to dynamically reweight adversarial examples, significantly reducing attack success rates while maintaining computational efficiency.

Better exploration with parameter noise

OpenAI Blog

OpenAI presents parameter noise, a technique that adds adaptive noise to neural network policy parameters rather than action spaces, enabling agents to learn tasks significantly faster than traditional action noise approaches. The method achieves 2x faster learning on HalfCheetah and represents a middle ground between evolution strategies and deep RL approaches like TRPO and DDPG.

Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics

arXiv cs.LG

This paper investigates adversarial robustness in Fuzzy ARTMAP, a streaming neural architecture, by introducing WB-Softmax as a mechanism-aligned white-box attack surrogate. It evaluates progressive training and selective updating strategies to improve robustness without data replay, while also offering interpretable diagnostics for structural failures.