PACE: Predicting Agent Benchmark Performance from Low-Cost Individual Tests

A framework for predicting agent benchmark scores from low-cost individual tests achieves 85% ranking accuracy at less than 1% of evaluation costs.

Share on: