How NOT to fine-tune your medical LLM; a look into Mark Kaplan's healtthruth.ai - "override and reframe foundational training"
Summary
This article critiques Mark Kaplan's approach to fine-tuning medical LLMs via his platform healtthruth.ai, highlighting pitfalls in overriding foundational training for healthcare AI.
Similar Articles
Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models
This paper presents a large-scale assessment of medical LLMs, including custom MedGPTs and open-source models, finding 25-30% exhibit low factual accuracy and 33.6-54.3% violate operational thresholds, highlighting systemic safety risks.
AI in medicine will fail on calibration long before it fails on eloquence.
The article argues that AI in medicine may fail due to poor calibration and inability to express uncertainty, rather than lack of eloquence, and calls for features that build trust.
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
This paper investigates how large language models maintain correct beliefs under adversarial pressure in clinical settings, proposing R-FT fine-tuning to improve epistemic resilience while balancing corrigibility, and demonstrating significant robustness gains on medical benchmarks.
A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models
This paper presents a multi-domain red teaming framework for evaluating safety, robustness, and fairness of medical LLMs across 690 clinically grounded scenarios. Results show that high aggregate accuracy can mask critical failures, and hybrid evaluation with clinician oversight is necessary for credible safety assessment.
I run an AI-based fact-checking platform and I refuse to let the LLM produce the verdict. Here's why.
The author details their decision to exclude LLMs from generating final fact-check verdicts in favor of a hybrid architecture that uses LLMs for data extraction and a deterministic Python layer for scoring, citing issues with stochastic instability and auditability.