Tag
FaithMed is a framework that trains LLMs for faithful evidence-based medical reasoning by integrating clinician-designed rubrics with reinforcement learning using step-level process reward assignment, achieving significant improvements over baselines on multiple medical benchmarks.
This paper introduces MedKGTab, a knowledge-injected framework that uses biomedical knowledge graphs to expand cross-domain features in tabular medical data, addressing data scarcity by generating high-fidelity biomedical profiles.
This paper adapts a mixture-of-experts diffusion language model, DiffusionGemma-26B, for interactive radiology report drafting, showing it matches or exceeds autoregressive models in medical VQA with 3.5-4.4x faster decoding and bidirectional infill capabilities.
IMCBench is a new benchmark for evaluating multimodal LLMs on image-grounded medical conversations, pairing clinical images with synthetic patient profiles. Evaluations across safety, accuracy, and uncertainty show that even strong models like Claude Opus 4.6 have safety issues, highlighting the need for multi-dimensional evaluation.
This paper proposes TriageRA-CCF, a method for adaptive rank budgeting in LoRA for medical question answering. It uses source-side signals (base-model confidence, clinical coverage, counterfactual proxy) to dynamically choose rank budgets, achieving modest accuracy gains on Qwen3-8B and Llama3.1-8B.
A year after its inception, OpenMed has achieved 340 million model downloads, offering over 1,500 open medical models under Apache 2.0, with 650+ capable of running on-device on iPhones.
Describes a medical speech-to-text system that runs locally on a MacBook, enabling streaming transcription without cloud dependency.
This paper applies ensemble machine learning models (Random Forest, Gradient Boosting, XGBoost, Extra Trees) to detect cirrhosis in hepatitis C patients using 28 features from 2038 Egyptian patients. The Extra Trees model achieved 96.92% accuracy with only 16 features, outperforming other models.
A free RAG API using medical Wikipedia articles is now available to provide local LLMs with accurate medical facts, as demonstrated by correcting hallucinations about Lhermitte sign.
MedGuards proposes a multi-agent framework for detecting and correcting errors in medical text using specialized agents and confidence-guided arbitration, improving reliability without additional training. Experiments on multilingual clinical notes show significant improvements.
MMed-Bench-IR is a heterogeneous benchmark for multilingual medical information retrieval across six languages, evaluating cross-lingual alignment, concept discrimination, and evidence retrieval. It reveals severe performance drops for non-English queries, highlighting gaps in existing English-only evaluations.
This study suggests that AI can make expert-led periodic reanalysis of old medical cases more scalable, helping clinicians revisit cases as medical knowledge advances and potentially bring answers to more cases that previously evaded analysis.
This paper systematically surveys the core components of medical embodied AI, emphasizing the coordinated integration of perception, decision-making, and action in clinical environments, and reviews representative applications, datasets, and future research directions.
Proposes MSAIC-Net, a multi-scale attention-enhanced convolutional network for detecting myocardial substrate abnormalities from ECG signals, using imbalance-aware contrastive learning and lead-wise permutation importance for interpretability.
This paper presents a multi-domain red teaming framework for evaluating safety, robustness, and fairness of medical LLMs across 690 clinically grounded scenarios. Results show that high aggregate accuracy can mask critical failures, and hybrid evaluation with clinician oversight is necessary for credible safety assessment.
This paper introduces a framework for auditing source-dependence in medical multi-source RAG systems, releasing the TransplantQA benchmark, HERO-QA retrieval strategy, and a structured-output judge to measure inter-source answer relationships. It demonstrates that better retrieval reveals more disagreement than previously estimated, and argues for shifting NLP evaluation from answer correctness to inter-source relationship analysis.
Presents EAMS, a lightweight equivariant mesh segmentation framework that generalizes across anatomical tasks, showing a trade-off between equivariance and accuracy on subtle features.
MedicalBench is a new benchmark for evaluating large language models on medical concept extraction from electronic health records, focusing on implicit reasoning and evidence grounding. It includes 823 expert-annotated examples and shows that current models perform modestly, highlighting the difficulty of extracting implicitly stated medical concepts.
COTCAgent is a hierarchical reasoning framework for longitudinal electronic health records that uses a probabilistic chain-of-thought completion approach, achieving 90.47% Top-1 accuracy on a self-built dataset and outperforming existing medical agents.
OpenAI launches OpenAI for Healthcare, a suite of enterprise products including ChatGPT for Healthcare and API solutions designed to support HIPAA-compliant AI adoption across healthcare organizations. The offering features healthcare-optimized GPT-5 models, evidence-based retrieval with citations, policy integration, and workflow automation tools already deployed at major institutions like Stanford Medicine and UCSF.