Tag
This paper investigates a harmful phenomenon in long chain-of-thought (CoT) training traces where post-conclusion continuation reduces training utility, and proposes a diagnostic method called HarmfulContinuationCut (HCC) to detect such harmful continuations.