Tag
This paper demonstrates that training large language models with stochastic tokenization instead of deterministic canonical tokenization significantly improves robustness to adversarial attacks and random perturbations, with improvements shown across pre-training, fine-tuning, and in-context learning without increasing inference costs.