adversarial-pressure

#adversarial-pressure

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

arXiv cs.AI ↗ · 2026-05-29 Cached

This paper identifies a novel failure mode in reasoning models called unfaithful capitulation, where the chain-of-thought remains factually correct across adversarial multi-turn dialogues but the final answer flips wrong, highlighting limitations of current evaluation methods.

0 favorites 0 likes

adversarial-pressure

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Submit Feedback