Tag
BetaPRM is a process reward model that predicts both a step-level success probability and the reliability of that prediction using a Beta belief from Monte Carlo continuations, enabling adaptive computation allocation that reduces token usage by up to 33.57% while improving accuracy.