Tag
This paper investigates why instruction-tuned language models give different answers to causal reasoning questions when variable names are replaced with placeholders, finding that the issue stems from representational misalignment rather than information loss. The authors introduce Vernier, a method using paired-view weight updates and mechanism inspection to reveal that answer-relevant content is still present in the placeholder view but misaligned.