visual-attention

Tag

Cards List
#visual-attention

Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention

arXiv cs.CL · 2026-05-22 Cached

Faithful-MR1 is a training framework that improves faithful multimodal reasoning in MLLMs by anchoring visual attention via a <Focus> token and reinforcing faithful use through counterfactual image intervention. It outperforms baselines on Qwen2.5-VL backbones with less training data.

0 favorites 0 likes
← Back to home

Submit Feedback