faithful-decoding

#faithful-decoding

Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

arXiv cs.AI ↗ · 4d ago Cached

This paper reveals that hallucination in large vision-language models is caused by a dynamic structural misalignment where certain attention heads act as risky mediators, decoupling from visual evidence to lock onto language priors. The authors propose Fox, a training-free causal intervention framework that diagnoses and physically severs these pathological shortcuts, achieving state-of-the-art performance in faithful decoding.

0 favorites 0 likes

faithful-decoding

Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

Submit Feedback