groupwise-ranking

Tag

Cards List
#groupwise-ranking

Prioritizing the Best: Incentivizing Reliable Multimodal Reasoning by Rewarding Beyond Answer Correctness

arXiv cs.CL · 2026-04-22 Cached

Researchers introduce Groupwise Ranking Reward to fix reasoning-answer inconsistency in multimodal RL, boosting reliability-conditioned accuracy from 47.4% to 54.7% over standard RLVR.

0 favorites 0 likes
← Back to home

Submit Feedback