Tag
OpenAI discusses the importance of evals (evaluations) for measuring and forecasting model progress, especially as benchmarks become saturated or gamed, featuring insights from Tejal Patwardhan and Andrew Mayne.