code-ai

Tag

Cards List
#code-ai

@swyx: Finally! the first eval ship from cog!!!!!!!!!! To contextualize: @METR_Evals cap out at ~16 hours. Cog has private ent…

X AI KOLs Following · 5d ago Cached

Cognition released the first evaluation suite for Devin, offering up to 100-hour enterprise evals with a financial guarantee. The dataset includes real-world Java/TypeScript/Python/C# tasks from 126 enterprise users, aiming to measure engineering productivity more accurately than existing benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback