proxy-benchmarks

Tag

Cards List
#proxy-benchmarks

PACE: A Proxy for Agentic Capability Evaluation

Hugging Face Daily Papers · yesterday Cached

This paper introduces PACE, a framework that predicts expensive LLM agent benchmark scores using a small subset of cheaper non-agentic evaluation instances, achieving high accuracy at less than 1% of the cost.

0 favorites 0 likes
← Back to home

Submit Feedback