Stochasticity in Tokenization Improves Robustness

arXiv cs.CL Papers

Summary

This paper demonstrates that training large language models with stochastic tokenization instead of deterministic canonical tokenization significantly improves robustness to adversarial attacks and random perturbations, with improvements shown across pre-training, fine-tuning, and in-context learning without increasing inference costs.

arXiv:2604.16037v1 Announce Type: new Abstract: The widespread adoption of large language models (LLMs) has increased concerns about their robustness. Vulnerabilities in perturbations of tokenization of the input indicate that models trained with a deterministic canonical tokenization can be brittle to adversarial attacks. Recent studies suggest that stochastic tokenization can deliver internal representations that are less sensitive to perturbations. In this paper, we analyze how stochastic tokenization affects robustness to adversarial attacks and random perturbations. We systematically study this over a range of learning regimes (pre-training, supervised fine-tuning, and in-context learning), datasets, and model architectures. We show that pre-training and fine-tuning with uniformly sampled stochastic tokenization improve robustness to random and adversarial perturbations. Evaluating on uniformly sampled non-canonical tokenizations reduces the accuracy of a canonically trained Llama-1b model by 29.8%. We find that training with stochastic tokenization preserves accuracy without increasing inference cost.
Original Article
View Cached Full Text

Cached at: 04/20/26, 08:29 AM

# Stochasticity in Tokenisation Improves Robustness
Source: https://arxiv.org/abs/2604.16037
View PDF (https://arxiv.org/pdf/2604.16037)

> Abstract: The widespread adoption of large language models (LLMs) has increased concerns about their robustness. Vulnerabilities in perturbations of tokenisation of the input indicate that models trained with a deterministic canonical tokenisation can be brittle to adversarial attacks. Recent studies suggest that stochastic tokenisation can deliver internal representations that are less sensitive to perturbations. In this paper, we analyse how stochastic tokenisations affect robustness to adversarial attacks and random perturbations. We systematically study this over a range of learning regimes (pre-training, supervised fine-tuning, and in-context learning), datasets, and model architectures. We show that pre-training and fine-tuning with uniformly sampled stochastic tokenisations improve robustness to random and adversarial perturbations. Evaluating on uniformly sampled non-canonical tokenisations reduces the accuracy of a canonically trained Llama-1b model by 29.8%. We find that training with stochastic tokenisation preserves accuracy without increasing inference cost.

## Submission history

From: Sophie Steger [view email (https://arxiv.org/show-email/c01e50c3/2604.16037)] **[v1]** Fri, 17 Apr 2026 13:05:46 UTC (88 KB)

Similar Articles

Probabilistic Attribution For Large Language Models

arXiv cs.CL

This paper proposes a model-agnostic probabilistic token attribution measure for LLMs using Bayes' rule to invert next-token log probabilities, capturing the model's internal representation of token sequences and improving interpretability through entropy analysis.

Emergent retokenization symmetry in large language models: phenomenology and applications

arXiv cs.CL

This paper discovers that large language models partially exhibit emergent symmetry under retokenization—replacing a prompt's canonical tokenization with an alternative valid segmentation while preserving bytes exactly. The authors use this phenomenon to probe compositional understanding and propose retokenization as a novel inference-time sampling strategy that can recover solutions not found by conventional temperature sampling.