image-captioning

#image-captioning

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

arXiv cs.LG ↗ · 2026-05-21 Cached

Introduces ClaimDiff-RL, a reinforcement learning framework for long-form image captioning that uses typed, verifiable claim differences as reward units to separately measure and balance hallucination and missing facts, improving faithfulness and coverage.

0 favorites 0 likes

#image-captioning

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

The paper introduces BalCapRL, a balanced reinforcement learning framework for multimodal large language models that jointly optimizes correctness, coverage, and linguistic quality in image captioning. It demonstrates improved performance over existing methods by addressing trade-offs between utility and fluency through reward decoupling and length-conditional masking.

0 favorites 0 likes

image-captioning

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

Submit Feedback