Tag
This paper introduces a benchmark of 1,200 clinical documents with 9,184 uncertainty annotations to evaluate whether LLMs preserve diagnostic uncertainty in clinical text, finding that LLMs often fail to preserve original uncertainty cues and struggle with nuanced distinctions.