Tag
A study of 25,000 AI scientist trials finds the agents ignore evidence 68% of the time and rarely revise hypotheses, showing popular scaffolding fixes don’t instill true scientific reasoning.