Tag
The First Proof test evaluated four AI systems on novel research-level math problems, with the top model scoring only 6 out of 10, demonstrating that current AI still lags behind top mathematicians in rigorous reasoning.