Tag
This paper proposes Outcome-Supervised Process Reward Modeling via Learnable Credit Assignment (LCA), a framework that jointly learns credit assignment and reward modeling under a weakest-link principle, formulated as a Multiple Instance Learning problem with Softmax-Weighted-Sum pooling. Experiments show it outperforms existing outcome-supervised PRMs across multiple tasks.