medical-benchmark

Tag

Cards List
#medical-benchmark

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

Hugging Face Daily Papers · 2026-06-10 Cached

Introduces MedMisBench to measure LLMs' ability to maintain correct medical reasoning under misleading context. Shows that accuracy drops sharply from 71.1% to 38.0% under adversarial conditions, with potential harm flagged by clinical panel.

0 favorites 0 likes
← Back to home

Submit Feedback