machine-learning-engineering

Tag

Cards List
#machine-learning-engineering

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

OpenAI Blog · 2024-10-10 Cached

OpenAI introduces MLE-bench, a benchmark of 75 Kaggle ML competitions to evaluate AI agents on real-world ML engineering tasks. The best setup, o1-preview with AIDE scaffolding, achieves at least a Kaggle bronze medal in 16.9% of competitions.

0 favorites 0 likes
← Back to home

Submit Feedback