Tag
OpenAI introduces MLE-bench, a benchmark of 75 Kaggle ML competitions to evaluate AI agents on real-world ML engineering tasks. The best setup, o1-preview with AIDE scaffolding, achieves at least a Kaggle bronze medal in 16.9% of competitions.