medical-benchmark

#medical-benchmark

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Hugging Face Daily Papers ↗ · 2026-06-10 Cached

Introduces MedMisBench to measure LLMs' ability to maintain correct medical reasoning under misleading context. Shows that accuracy drops sharply from 71.1% to 38.0% under adversarial conditions, with potential harm flagged by clinical panel.

0 favorites 0 likes

medical-benchmark

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Submit Feedback