ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection

Hugging Face Daily Papers Papers

Summary

ReMMD introduces a realistic multilingual multi-image agentic verification framework for multimodal misinformation detection, including a benchmark (ReMMDBench) with 500 samples and 2,756 images, and an agent (ReMMD-Agent) that achieves superior veracity performance with reduced costs.

Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, binary labels, or one manipulation source, while agentic verification remains costly under realistic evidence search. We present ReMMD, a realistic multilingual multi-image agentic verification framework for multimodal misinformation detection. ReMMD includes ReMMDBench, a real-world multimodal misinformation detection benchmark with 500 samples, 2,756 images, five monolingual languages, two cross-lingual settings, three text-length tiers, multi-image posts, five-way veracity labels, eight distortion labels, evidence provenance, and rationales. It also includes ReMMD-Agent, a persistent-memory verifier that decomposes posts into atomic points, builds a reusable evidence set, and predicts structured L1/L2/L3 outputs. Across proprietary systems, open LVLMs, MMD-Agent, and T2-Agent, ReMMD-Agent obtains the best five-way veracity performance, with 41.80% accuracy and 39.12% macro-F1 using GPT-5.2, while reducing cost by 17.5% relative to MMD-Agent and 79.9% relative to T2-Agent. The project is available at https://dang-ai.github.io/ReMMD.
Original Article
View Cached Full Text

Cached at: 06/24/26, 05:46 AM

Paper page - ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection

Source: https://huggingface.co/papers/2606.24112

Abstract

A comprehensive multimodal misinformation detection framework is introduced that handles complex, multilingual content with multiple images and diverse verification approaches, achieving superior performance while reducing computational costs.

Multimodal misinformation detectionis increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, binary labels, or one manipulation source, whileagentic verificationremains costly under realistic evidence search. We present ReMMD, a realistic multilingual multi-imageagentic verificationframework formultimodal misinformation detection. ReMMD includesReMMDBench, a real-worldmultimodal misinformation detectionbenchmark with 500 samples, 2,756 images, five monolingual languages, two cross-lingual settings, three text-length tiers, multi-image posts, five-wayveracity labels, eightdistortion labels,evidence provenance, and rationales. It also includesReMMD-Agent, a persistent-memory verifier that decomposes posts into atomic points, builds a reusable evidence set, and predictsstructured L1/L2/L3 outputs. Across proprietary systems, openLVLMs, MMD-Agent, and T2-Agent,ReMMD-Agentobtains the best five-way veracity performance, with 41.80% accuracy and 39.12% macro-F1 usingGPT-5.2, while reducing cost by 17.5% relative to MMD-Agent and 79.9% relative to T2-Agent. The project is available at https://dang-ai.github.io/ReMMD.

View arXiv pageView PDFProject pageGitHub0Add to collection

Get this paper in your agent:

hf papers read 2606\.24112

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.24112 in a model README.md to link it from this page.

Datasets citing this paper1

#### DDAI-D/ReMMDBench Updatedabout 3 hours ago • 5 • 1

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.24112 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Hugging Face Daily Papers

MemEye is a visual-centric evaluation framework that assesses multimodal agent memory by measuring visual evidence granularity and retrieval complexity across 8 life-scenario tasks, revealing that current architectures struggle to preserve fine-grained visual details and reason about state changes over time.

Reinforcing Multimodal Reasoning Against Visual Degradation

Hugging Face Daily Papers

This paper introduces ROMA, an RL fine-tuning framework that enhances the robustness of multimodal large language models against visual degradations like blur and compression artifacts. It achieves this through a dual-forward-pass strategy and specialized regularization techniques, improving performance on reasoning benchmarks without sacrificing accuracy on clean inputs.