Tag
This paper introduces a method to predict best-of-N inference scaling gains for language models using cheap statistics from a single labeled validation-set sampling pass. A compact predictor with three core features achieves Spearman ρ=0.90 with actual gains, enabling screening of configurations before expensive reward-model scoring.
BetaPRM is a process reward model that predicts both a step-level success probability and the reliability of that prediction using a Beta belief from Monte Carlo continuations, enabling adaptive computation allocation that reduces token usage by up to 33.57% while improving accuracy.