Tag
Introduces ClaimDiff-RL, a reinforcement learning framework for long-form image captioning that uses typed, verifiable claim differences as reward units to separately measure and balance hallucination and missing facts, improving faithfulness and coverage.
The paper introduces BalCapRL, a balanced reinforcement learning framework for multimodal large language models that jointly optimizes correctness, coverage, and linguistic quality in image captioning. It demonstrates improved performance over existing methods by addressing trade-offs between utility and fluency through reward decoupling and length-conditional masking.