Tag
Study shows GPT and Claude exhibit distinct, unreliable repair behaviors in multi-turn math dialogues, with some models resisting correction and others over-correcting.