Tag
This paper investigates how tonal variations in prompts affect LLM accuracy on multiple-choice questions, finding systematic but model-dependent effects. The study uses multiple models and datasets to demonstrate that tone can significantly alter performance, cautioning against assuming tone-robust reliability.