humanity's last exam current benchmarks thoughts?

Reddit r/singularity 06/15/26, 12:37 AM News

Summary

Discussion of recent AI model scores on the 'humanity's last exam' benchmark, noting improvement from GPT-4o's 2.7% in May 2024 to around 45% by June 2026, questioning the exam's difficulty.

so, some of the recent models have scored around 45 percent on that exam. This is on June 2026... but in 2024, May, gpt4o scored 2.7 percent. Now, to me, this seems like a good progress. But i wanted to ask, is the exam really that hard?

Original Article

Similar Articles

Fable passes the "When A.I. Passes This Test, Look Out" test

Reddit r/ArtificialInteligence

Claude Fable achieves 53% on the 'Humanity's Last Exam' benchmark, surpassing the expected end-of-2025 milestone earlier than projected, indicating rapid AI progress.

AI can finally pass the Turing Test better than a human, study warns

Reddit r/ArtificialInteligence

A new study published in PNAS shows that advanced LLMs like GPT-4.5 can pass the Turing Test, with participants finding them more human than actual humans, prompting a reevaluation of what the test measures.

Does anyone else feel like AI benchmarks are becoming less useful for predicting real-world performance?

Reddit r/ArtificialInteligence

The article discusses the growing disconnect between high AI benchmark scores and actual real-world performance, highlighting issues like consistency, latency, and context handling.

@omarsar0: The efficiency frontier! Where do you think GPT-5.6 will land?

X AI KOLs Following

Discussion of recent benchmark results for Claude Opus 4.8 and GPT-5.5 on DeepSWE Bench, with speculation about future GPT-5.6 performance and efficiency trends.

@rohanpaul_ai: Today’s frontier agents are far less ready for real-world automation than their benchmark scores suggest. This paper pr…

X AI KOLs Following

This paper introduces Agents' Last Exam, a benchmark that tests AI agents on real expert work across 55 digital work areas. Current best agents fail most tasks, averaging only 2.6% pass rate on the hardest tier, revealing a large gap between benchmark scores and real-world automation readiness.

Similar Articles

Fable passes the "When A.I. Passes This Test, Look Out" test

AI can finally pass the Turing Test better than a human, study warns

Does anyone else feel like AI benchmarks are becoming less useful for predicting real-world performance?

@omarsar0: The efficiency frontier! Where do you think GPT-5.6 will land?

@rohanpaul_ai: Today’s frontier agents are far less ready for real-world automation than their benchmark scores suggest. This paper pr…

Submit Feedback