@scaling01: Opus 4.8 is the best coding model out there FrontierCode by Cognition is probably the highest quality coding benchmark …

X AI KOLs Timeline Tools

Summary

Cognition introduces FrontierCode, a high-quality coding benchmark that goes beyond unit tests to measure code maintainability, regression safety, and quality, with 150 handcrafted tasks by open-source developers.

Opus 4.8 is the best coding model out there FrontierCode by Cognition is probably the highest quality coding benchmark we have seen so far it moves beyond just using unit-testing for scoring, it also tests for regression safety, mechanical cleanliness, test correctness, scope and code quality 20+ open-source developers handcrafted 150 tasks, each of which took over 40 hours to construct it also tests a more diverse set of programming languages
Original Article
View Cached Full Text

Cached at: 06/09/26, 10:44 AM

Opus 4.8 is the best coding model out there

FrontierCode by Cognition is probably the highest quality coding benchmark we have seen so far

it moves beyond just using unit-testing for scoring, it also tests for regression safety, mechanical cleanliness, test correctness, scope and code quality

20+ open-source developers handcrafted 150 tasks, each of which took over 40 hours to construct

it also tests a more diverse set of programming languages

Cognition (@cognition): Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers.

Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

Similar Articles

FrontierCode

Hacker News Top

FrontierCode is a new benchmark from Cognition AI that measures AI models' ability to write high-quality, maintainable code by evaluating mergeability. Results show even top models like Claude Opus 4.8 score only 13.4% on the hardest subset, highlighting a significant gap in code quality.