preference-data

Tag

Cards List
#preference-data

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

Hugging Face Daily Papers · 2026-05-27 Cached

RUBRIC-ARROW presents an alternating framework for reward modeling that improves upon rubric-based methods by reducing ties and leveraging pairwise preference data, achieving competitive accuracy and gains for LLM post-training in non-verifiable domains.

0 favorites 0 likes
← Back to home

Submit Feedback