Tag
This paper identifies a 'positional copying' shortcut where small language models answer arithmetic questions by copying the last number before the answer delimiter, bypassing actual reasoning. This effect explains why shuffling CoT steps retains performance; it accounts for 89-92% of teacher-forcing accuracy in 1-3B models on GSM8K.
This research paper investigates how shortcut solutions learned by Transformer models, specifically BERT, impair their ability to perform continual compositional reasoning. It contrasts BERT with ALBERT, finding that ALBERT's recurrent nature offers better inductive bias for continual learning tasks.
Research shows Chain-of-Thought prompting harms visual-spatial reasoning in multimodal LLMs due to shortcut learning and hallucinating visual details from text alone.