When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

Hugging Face Daily Papers 04/19/26, 12:00 AM Papers

Summary

MedFocusLeak introduces the first transferable black-box adversarial attack on medical vision-language models, using imperceptible background perturbations to mislead clinical diagnoses across six imaging modalities.

Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, yet their robustness to adversarial attacks remains largely unexplored, posing serious risks. Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, while transferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible. The method injects coordinated perturbations into non-diagnostic background regions and employs an attention distraction mechanism to shift the model's focus away from pathological areas. Extensive evaluations across six medical imaging modalities show that MedFocusLeak achieves state-of-the-art performance, generating misleading yet realistic diagnostic outputs across diverse VLMs. We further introduce a unified evaluation framework with novel metrics that jointly capture attack success and image fidelity, revealing a critical weakness in the reasoning capabilities of modern clinical VLMs.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/22/26, 01:58 AM

Paper page - When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

Source: https://huggingface.co/papers/2604.17318 https://huggingface.co/login?next=%2Fpapers%2F2604.17318-

Abstract

MedFocusLeak enables transferable black-box attacks on vision-language models for medical imaging by injecting imperceptible perturbations that redirect model attention, demonstrating significant vulnerabilities in clinical diagnostic reasoning.

Vision-Language Models(VLMs) are increasingly used inclinical diagnostics, yet their robustness toadversarial attacks remains largely unexplored, posing serious risks. Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, whiletransferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible. The method injects coordinated perturbations into non-diagnostic background regions and employs anattention distraction mechanismto shift the model’s focus away from pathological areas. Extensive evaluations across sixmedical imagingmodalities show that MedFocusLeak achieves state-of-the-art performance, generating misleading yet realistic diagnostic outputs across diverse VLMs. We further introduce a unified evaluation framework with novel metrics that jointly capture attack success andimage fidelity, revealing a critical weakness in the reasoning capabilities of modern clinical VLMs.

View arXiv page View PDF Project page GitHub2 Add to collection

Community

Paper submitter

about 6 hours ago

The First black box transferable attack paper for Medical Vision Language Models

Upload images, audio, and videos by dragging in the text input, pasting, orclicking here.

Tap or paste here to upload images

https://huggingface.co/login?next=%2Fpapers%2F2604.17318-

Get this paper in your agent:

hf papers read 2604\.17318

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.17318 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.17318 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.17318 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

Paper page - When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

Abstract

Community

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Attacking machine learning with adversarial examples

Surrogate modeling for interpreting black-box LLMs in medical predictions

Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

Robust adversarial inputs

Adversarial attacks on neural network policies

Submit Feedback

Similar Articles

Attacking machine learning with adversarial examples

Surrogate modeling for interpreting black-box LLMs in medical predictions

Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

Adversarial attacks on neural network policies