Tag
CODA-BENCH is a new benchmark for evaluating code agents on data-intensive tasks, bridging the gap between code-centric and data-centric evaluations. It includes over 1,000 tasks from 31 communities, with realistic data scale and noise, revealing that even top agents achieve only 61.1% success rate.
Martin Kleppmann discusses how the fundamentals of building large, distributed systems have evolved over the past decade in light of the updated second edition of his book "Designing Data-Intensive Applications."