Tag
SCICONVBENCH is a benchmark that evaluates LLMs on multi-turn clarification for ill-posed scientific queries across computational science domains, finding that even frontier models struggle with disambiguation and frequently make silent assumptions.