proxy-benchmarks

#proxy-benchmarks

PACE: A Proxy for Agentic Capability Evaluation

Hugging Face Daily Papers ↗ · yesterday Cached

This paper introduces PACE, a framework that predicts expensive LLM agent benchmark scores using a small subset of cheaper non-agentic evaluation instances, achieving high accuracy at less than 1% of the cost.

0 favorites 0 likes

proxy-benchmarks

PACE: A Proxy for Agentic Capability Evaluation

Submit Feedback