Tag
This paper identifies a novel failure mode in reasoning models called unfaithful capitulation, where the chain-of-thought remains factually correct across adversarial multi-turn dialogues but the final answer flips wrong, highlighting limitations of current evaluation methods.