Tag
A study evaluates frontier models' ability to forecast scientific progress across 4,760 events, finding they can identify plausible directions but cannot reliably predict outcomes or timelines, with systematic overconfidence.
At Google I/O, DeepMind CEO Demis Hassabis announced that scientific progress is becoming computable, launching the Gemini for Science system to help researchers read papers, write code, and generate hypotheses, enabling science to scale and iterate like software.
This paper introduces CUSP, a benchmark for evaluating AI systems' ability to forecast scientific progress, finding that current models show systematic overconfidence and domain-dependent limitations, failing to reliably predict scientific advances.