fine-grained-feedback

Tag

Cards List
#fine-grained-feedback

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

arXiv cs.AI · 2026-06-18 Cached

This paper proposes Rubric-Conditioned Self-Distillation (RCSD), a framework that uses fine-grained rubric criteria to provide token-level guidance during self-distillation, improving reasoning performance over scalar-reward methods like GRPO and OPSD.

0 favorites 0 likes
← Back to home

Submit Feedback