counterfactual-augmentation

#counterfactual-augmentation

Vernier: Probing Representational Misalignment Behind Lexical Gaps in Causal Reasoning

arXiv cs.CL ↗ · 4d ago Cached

This paper investigates why instruction-tuned language models give different answers to causal reasoning questions when variable names are replaced with placeholders, finding that the issue stems from representational misalignment rather than information loss. The authors introduce Vernier, a method using paired-view weight updates and mechanism inspection to reveal that answer-relevant content is still present in the placeholder view but misaligned.

0 favorites 0 likes

counterfactual-augmentation

Vernier: Probing Representational Misalignment Behind Lexical Gaps in Causal Reasoning

Submit Feedback