label-free

Tag

Cards List
#label-free

Label-Free Reinforcement Learning via Cross-Model Entropy

arXiv cs.LG · 2026-05-29 Cached

Proposes Cross-Model Entropy (CME) as a label-free reward signal for reinforcement learning post-training of large language models, enabling open-ended instruction following without ground-truth verifiers or human preference labels.

0 favorites 0 likes
← Back to home

Submit Feedback