clinical

Tag

Cards List
#clinical

MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents

arXiv cs.AI · 16h ago Cached

MedCUA-Bench is a new benchmark for evaluating computer-use agents on clinical software tasks, covering 18 scenarios across 10 medical domains with safety dimensions. Results show that current agents perform poorly, especially on real OpenEMR, highlighting a significant gap in reliability.

0 favorites 0 likes
#clinical

AMNESIA: A Large Scale Medical Unlearning Benchmark Suite with Disease-Informed Analysis

arXiv cs.LG · 2d ago Cached

AMNESIA is the first large-scale open-source benchmark for medical unlearning, comprising 70,560 QA pairs from 8,820 patient notes across 11 diseases, designed to evaluate forgetting of both factual and reasoning knowledge in LLMs.

0 favorites 0 likes
#clinical

On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series

arXiv cs.LG · 2026-05-27 Cached

This paper investigates the role of inductive bias in time-series pretraining for clinical data, proposing PathoFM, an encoder-centric transformer pretrained on multivariate gait windows. The study compares different pretraining objectives and finds that dynamics-centric mixtures yield the most balanced transfer across classification and regression tasks.

0 favorites 0 likes
#clinical

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

arXiv cs.AI · 2026-05-26 Cached

This paper investigates how large language models maintain correct beliefs under adversarial pressure in clinical settings, proposing R-FT fine-tuning to improve epistemic resilience while balancing corrigibility, and demonstrating significant robustness gains on medical benchmarks.

0 favorites 0 likes
#clinical

AnchorDiff: Topology-Aware Masked Diffusion with Confidence-based Rewriting for Radiology Report Generation

arXiv cs.AI · 2026-05-19 Cached

AnchorDiff proposes a topology-aware masked diffusion framework for radiology report generation, integrating RadGraph-derived clinical anchors and confidence-based rewriting to achieve state-of-the-art results on MIMIC-CXR and MIMIC-RG4 benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback