Tag
Proposes Cross-Model Entropy (CME) as a label-free reward signal for reinforcement learning post-training of large language models, enabling open-ended instruction following without ground-truth verifiers or human preference labels.