Tag
This study evaluates how prompting language (English vs. French) affects diagnostic reasoning and accuracy across five LLMs using 180 clinical vignettes, finding that most models perform significantly better in English, with o3 being the only exception.
UMBC researchers show LLMs judge scientific claim feasibility better when given outcome data than experiment descriptions, and that incomplete experimental context can hurt accuracy.