adversarial-context

Tag

Cards List
#adversarial-context

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Hugging Face Daily Papers · 2026-06-10 Cached

Introduces MedMisBench to measure LLMs' ability to maintain correct medical reasoning under misleading context. Shows that accuracy drops sharply from 71.1% to 38.0% under adversarial conditions, with potential harm flagged by clinical panel.

0 favorites 0 likes
#adversarial-context

State Contamination in Memory-Augmented LLM Agents

arXiv cs.AI · 2026-05-19 Cached

This paper identifies and studies 'memory laundering' in LLM agents, where toxic or adversarial context compressed into memory summaries evades standard toxicity detectors while still influencing future generations. It introduces the sub-threshold propagation gap (SPG) to measure hidden downstream influence and shows that sanitizing toxic state before summarization is more effective than post-hoc cleaning.

0 favorites 0 likes
← Back to home

Submit Feedback