Tag
Cognition released the first evaluation suite for Devin, offering up to 100-hour enterprise evals with a financial guarantee. The dataset includes real-world Java/TypeScript/Python/C# tasks from 126 enterprise users, aiming to measure engineering productivity more accurately than existing benchmarks.