confounds

Tag

Cards List
#confounds

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

arXiv cs.CL · 2d ago Cached

This paper demonstrates that linear probes on LLM hidden states detect task format confounds (e.g., source identity, response length) rather than distinct reasoning modes, using residualization and causal steering to show that high probe accuracy is due to superficial features, not computational structure.

0 favorites 0 likes
← Back to home

Submit Feedback