Stochasticity in Tokenization Improves Robustness
Summary
This paper demonstrates that training large language models with stochastic tokenization instead of deterministic canonical tokenization significantly improves robustness to adversarial attacks and random perturbations, with improvements shown across pre-training, fine-tuning, and in-context learning without increasing inference costs.
View Cached Full Text
Cached at: 04/20/26, 08:29 AM
# Stochasticity in Tokenisation Improves Robustness Source: https://arxiv.org/abs/2604.16037 View PDF (https://arxiv.org/pdf/2604.16037) > Abstract: The widespread adoption of large language models (LLMs) has increased concerns about their robustness. Vulnerabilities in perturbations of tokenisation of the input indicate that models trained with a deterministic canonical tokenisation can be brittle to adversarial attacks. Recent studies suggest that stochastic tokenisation can deliver internal representations that are less sensitive to perturbations. In this paper, we analyse how stochastic tokenisations affect robustness to adversarial attacks and random perturbations. We systematically study this over a range of learning regimes (pre-training, supervised fine-tuning, and in-context learning), datasets, and model architectures. We show that pre-training and fine-tuning with uniformly sampled stochastic tokenisations improve robustness to random and adversarial perturbations. Evaluating on uniformly sampled non-canonical tokenisations reduces the accuracy of a canonically trained Llama-1b model by 29.8%. We find that training with stochastic tokenisation preserves accuracy without increasing inference cost. ## Submission history From: Sophie Steger [view email (https://arxiv.org/show-email/c01e50c3/2604.16037)] **[v1]** Fri, 17 Apr 2026 13:05:46 UTC (88 KB)
Similar Articles
Probabilistic Attribution For Large Language Models
This paper proposes a model-agnostic probabilistic token attribution measure for LLMs using Bayes' rule to invert next-token log probabilities, capturing the model's internal representation of token sequences and improving interpretability through entropy analysis.
Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining
This paper investigates training-time data augmentation techniques to mitigate overfitting in autoregressive language model pretraining under data-constrained, compute-abundant regimes, finding that combining token-level noise, sequence permutations, and target offset prediction improves validation loss.
Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models
Proposes EKSFT, a selective fine-tuning method for large language models that masks tokens with high entropy or high KL divergence from a reference model, preserving pre-trained distribution while injecting task knowledge. Experiments on mathematical reasoning benchmarks show it outperforms standard SFT and improves subsequent RL fine-tuning.
Emergent retokenization symmetry in large language models: phenomenology and applications
This paper discovers that large language models partially exhibit emergent symmetry under retokenization—replacing a prompt's canonical tokenization with an alternative valid segmentation while preserving bytes exactly. The authors use this phenomenon to probe compositional understanding and propose retokenization as a novel inference-time sampling strategy that can recover solutions not found by conventional temperature sampling.
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation
This paper investigates the impact of subword tokenization on LLM training efficiency and performance by conducting controlled byte-level pretraining experiments. It reveals key factors such as training throughput and the integration of subword boundaries as linguistic priors.