adversarial-pressure

Tag

Cards List
#adversarial-pressure

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

arXiv cs.AI · 2026-05-29 Cached

This paper identifies a novel failure mode in reasoning models called unfaithful capitulation, where the chain-of-thought remains factually correct across adversarial multi-turn dialogues but the final answer flips wrong, highlighting limitations of current evaluation methods.

0 favorites 0 likes
← Back to home

Submit Feedback