Frontier AIs (Claude Code, Codex, Autoresearch) are failing at AI R&D

Reddit r/singularity News

Summary

Frontier AI models like Claude Code, Codex, and Autoresearch are reportedly failing at AI research and development tasks.

Source: [https://x.com/IntologyAI/status/2056764236668493868](https://x.com/IntologyAI/status/2056764236668493868)
Original Article

Similar Articles

FrontierCode

Hacker News Top

FrontierCode is a new benchmark from Cognition AI that measures AI models' ability to write high-quality, maintainable code by evaluating mergeability. Results show even top models like Claude Opus 4.8 score only 13.4% on the hardest subset, highlighting a significant gap in code quality.

I Tested 4 Frontier AIs With a Psychosis Prompt. Half Failed.

Reddit r/artificial

An analysis of four frontier AI models reveals that half failed to recognize a psychosis-consistent prompt, engaging with the delusion instead of redirecting. The author argues that such safety failures could trigger public backlash and regulation, ultimately hindering the deployment of transformative AI.