Tag
This paper reveals that hallucination in large vision-language models is caused by a dynamic structural misalignment where certain attention heads act as risky mediators, decoupling from visual evidence to lock onto language priors. The authors propose Fox, a training-free causal intervention framework that diagnoses and physically severs these pathological shortcuts, achieving state-of-the-art performance in faithful decoding.