mllms

#mllms

ShutterMuse: Capture-Time Photography Guidance with MLLMs

Hugging Face Daily Papers ↗ · 2026-06-24 Cached

Researchers introduce CaptureGuide-Bench, a benchmark for capture-time photography guidance, and ShutterMuse, a unified multimodal LLM trained to provide composition and pose recommendations, demonstrating improved performance over general-purpose models.

0 favorites 0 likes

#mllms

Correct When Paired, Wrong When Split: Decoupling and Editing Modality-Specific Neurons in MLLMs

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper identifies and addresses the 'editing decoupling failure' in Multimodal LLMs, where knowledge updates via multimodal inputs fail to generalize to unimodal queries. The authors propose DECODE, a method to disentangle and localize modality-specific neurons for more effective knowledge editing.

0 favorites 0 likes

#mllms

When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding

arXiv cs.AI ↗ · 2026-06-09 Cached

This paper studies the ability of multimodal large language models (MLLMs) to detect when the correct answer is absent in video understanding tasks, finding that models systematically fail by selecting plausible distractors instead of recognizing no valid option exists. The failure worsens in temporal reasoning and dense frame sampling, and chain-of-thought prompting only partially mitigates the issue.

0 favorites 0 likes

#mllms

SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation

Hugging Face Daily Papers ↗ · 2026-06-02 Cached

Introduces SynCred-Bench, a benchmark of 600 AI-generated misinformation images across six credible-form categories, showing that existing detectors (including MLLMs, open-source AIGC detectors, and commercial APIs) perform poorly, with human annotators also struggling.

0 favorites 0 likes

#mllms

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

Introduces ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence built on OmniGibson, covering 10 task categories and 29 subcategories. Experiments show active exploration substantially outperforms passive approaches, with failures mainly due to action blindness rather than perception, revealing a metacognitive gap in models compared to humans.

0 favorites 0 likes

mllms

ShutterMuse: Capture-Time Photography Guidance with MLLMs

Correct When Paired, Wrong When Split: Decoupling and Editing Modality-Specific Neurons in MLLMs

When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding

SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Submit Feedback