Tag
This paper introduces a novel task, transitive inference with exceptions, and analytically characterizes how neural network models (kernel ridge regression) balance relational generalization and memorization. The theory is validated in pretrained language models, showing systematic mistakes predicted by the theory.