Tag
This paper introduces PACE, a framework that predicts expensive LLM agent benchmark scores using a small subset of cheaper non-agentic evaluation instances, achieving high accuracy at less than 1% of the cost.