best-of-n

Tag

Cards List
#best-of-n

Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics

arXiv cs.CL · yesterday Cached

This paper introduces a method to predict best-of-N inference scaling gains for language models using cheap statistics from a single labeled validation-set sampling pass. A compact predictor with three core features achieves Spearman ρ=0.90 with actual gains, enabling screening of configurations before expensive reward-model scoring.

0 favorites 0 likes
#best-of-n

Process Rewards with Learned Reliability

arXiv cs.CL · 2026-05-18 Cached

BetaPRM is a process reward model that predicts both a step-level success probability and the reliability of that prediction using a Beta belief from Monte Carlo continuations, enabling adaptive computation allocation that reduces token usage by up to 33.57% while improving accuracy.

0 favorites 0 likes
← Back to home

Submit Feedback