linear-probing

#linear-probing

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

arXiv cs.CL ↗ · yesterday Cached

This paper demonstrates that linear probes on LLM hidden states detect task format confounds (e.g., source identity, response length) rather than distinct reasoning modes, using residualization and causal steering to show that high probe accuracy is due to superficial features, not computational structure.

0 favorites 0 likes

linear-probing

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

Submit Feedback