Mitigating Multimodal Hallucination via Phase-wise Self-reward

Hugging Face Daily Papers Papers

Summary

PSRD framework halves multimodal hallucination in LVLMs by using phase-wise self-reward decoding and a distilled lightweight reward model without extra supervision.

Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the dynamic nature of hallucination emergence. To address these, we introduce a new self-rewarding framework, enabling dynamic hallucination mitigation at inference time without external supervision. On the empirical side, we reveal that visual hallucination exhibits phase-wise dynamic patterns, peaking at the onset of each semantic phase. Drawing on these insights, we propose PSRD (Phase-wise \textbf{Self-Reward Decoding) for online hallucination correction guided by phase-wise self-reward signals. To reduce the cost of repeated self-evaluation during decoding, we distill the hallucination guidance signal from LVLMs into a lightweight reward model. The reward model subsequently provides on-the-fly guidance for targeted intervention during the decoding process, enabling precise hallucination suppression. The proposed PSRD significantly reduces the hallucination rate of LLaVA-1.5-7B by 50.0% and consistently outperforms existing post-hoc methods across five hallucination evaluation benchmarks for four LVLMs. Further analysis confirms that PSRD effectively mitigates hallucination propagation and achieves a highly controllable trade-off between strong performance and inference efficiency.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/22/26, 10:35 AM

Paper page - Mitigating Multimodal Hallucination via Phase-wise Self-reward

Source: https://huggingface.co/papers/2604.17982

Abstract

A new self-rewarding framework called PSRD is introduced for dynamic hallucination mitigation in large vision-language models during inference, using phase-wise self-reward signals and a distilled lightweight reward model for efficient hallucination correction.

Large Vision-Language Models(LVLMs) still struggle withvision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the dynamic nature of hallucination emergence. To address these, we introduce a newself-rewarding framework, enabling dynamichallucination mitigationat inference time without external supervision. On the empirical side, we reveal that visual hallucination exhibits phase-wise dynamic patterns, peaking at the onset of each semantic phase. Drawing on these insights, we propose PSRD (Phase-wise \textbf{Self-Reward Decoding) for online hallucination correction guided by phase-wise self-reward signals. To reduce the cost of repeated self-evaluation during decoding, we distill thehallucination guidance signalfrom LVLMs into alightweight reward model. The reward model subsequently provides on-the-fly guidance for targeted intervention during the decoding process, enabling precise hallucination suppression. The proposed PSRD significantly reduces the hallucination rate of LLaVA-1.5-7B by 50.0% and consistently outperforms existing post-hoc methods across five hallucination evaluation benchmarks for four LVLMs. Further analysis confirms that PSRD effectively mitigateshallucination propagationand achieves a highly controllable trade-off between strong performance andinference efficiency.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2604\.17982

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.17982 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.17982 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.17982 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

HalluSAE: Detecting Hallucinations in Large Language Models via Sparse Auto-Encoders

arXiv cs.CL

Researchers from Beihang University and other institutions propose HalluSAE, a framework using sparse autoencoders and phase transition theory to detect hallucinations in LLMs by modeling generation as trajectories through a potential energy landscape and identifying critical transition zones where factual errors occur.

Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits

arXiv cs.CL

This paper presents PCNet, a probabilistic circuit trained as a tractable density estimator on LLM residual streams to detect hallucinations as geometric anomalies. It also introduces PC-LDCD, a dynamic correction method that only intervenes on hallucinated tokens, achieving near-perfect detection and reduced corruption rates.

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

arXiv cs.CL

This paper investigates prompt-induced hallucinations in vision-language models through mechanistic analysis, identifying specific attention heads responsible for the models' tendency to favor textual prompts over visual evidence. The authors demonstrate that ablating these PIH-heads reduces hallucinations by at least 40% without additional training, revealing model-specific mechanisms underlying this failure mode.