@sherryyangML: Machine learning engineering (MLE) is the new agentic frontier. I'll be sharing our work on scaling RL for MLE agents a…
Summary
Two ICLR 2026 papers show how small RL-trained agents outperform frontier models on machine-learning engineering tasks and how MLE-Smith automatically scales MLE workloads.
Similar Articles
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
OpenAI introduces MLE-bench, a benchmark of 75 Kaggle ML competitions to evaluate AI agents on real-world ML engineering tasks. The best setup, o1-preview with AIDE scaffolding, achieves at least a Kaggle bronze medal in 16.9% of competitions.
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
This paper introduces LLM-as-Environment-Engineer, a framework where LLMs design their own training environments for reinforcement learning in multi-agent reasoning tasks, enabling self-improving training that surpasses larger proprietary models.
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
This paper studies when end-to-end reinforcement learning training improves multi-agent LLM workflows, comparing shared-policy and isolated-policy training across different workflows, tasks, and model scales, revealing conditional tradeoffs.
@charles_irl: Proper post-training RL, deployed broadly, is a key step towards a future where software systems quietly improve themse…
Modal announces an open-source library for reinforcement learning on its platform, addressing infrastructure challenges in post-training RL with scalable deployment.
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
This paper proposes the LLM-as-Environment-Engineer framework, where a policy model analyzes failures to automatically redesign the training environment for reinforcement learning, and introduces MAPF-FrozenLake as a controllable testbed. The framework, using Qwen3-4B, outperforms larger models like GPT and Gemini, showing that policy learning improves the model's ability to diagnose weaknesses.