step-level-feedback

Tag

Cards List
#step-level-feedback

Process Rewards with Learned Reliability

arXiv cs.CL · 2026-05-18 Cached

BetaPRM is a process reward model that predicts both a step-level success probability and the reliability of that prediction using a Beta belief from Monte Carlo continuations, enabling adaptive computation allocation that reduces token usage by up to 33.57% while improving accuracy.

0 favorites 0 likes
← Back to home

Submit Feedback