backdoor-attacks

Tag

Cards List
#backdoor-attacks

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

arXiv cs.LG · yesterday Cached

BYORn is a backdoor-robust fine-tuning framework for vision-language models that identifies and replaces poisoned responses with model-generated alternatives, improving robustness to backdoor attacks while maintaining clean-task performance.

0 favorites 0 likes
#backdoor-attacks

Could Open Models be trained to secretly go rogue?

Reddit r/LocalLLaMA · 2026-05-24

A discussion on whether open-weight AI models could be secretly trained with backdoors that activate upon trigger phrases or dates, potentially allowing unauthorized data exfiltration through tool-use harnesses.

0 favorites 0 likes
#backdoor-attacks

Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy

arXiv cs.LG · 2026-05-22 Cached

This paper introduces a framework that connects randomized smoothing to differential privacy through privacy profiles, enabling tight provable robustness guarantees against backdoor attacks that jointly affect training and inference. The approach is instantiated for DP-SGD and Deep Partition Aggregation with experiments on MNIST and CIFAR-10.

0 favorites 0 likes
#backdoor-attacks

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

Hugging Face Daily Papers · 2026-05-18 Cached

This paper introduces open-book benign rewriting (OBBR) as a proactive defense against backdoor attacks on LLMs, showing it neutralizes harmful content by projecting to benign prompts, and improves safety by 51% over state-of-the-art defenses.

0 favorites 0 likes
← Back to home

Submit Feedback