clinical-nlp

#clinical-nlp

Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse?

arXiv cs.AI ↗ · 2026-06-16 Cached

This study investigates whether instruction-tuned LLMs (Llama-3.1-8B, Qwen2.5-7B, Mistral-7B, Phi-3-mini) can reliably classify Correct Information Units in aphasic discourse transcripts. Few-shot prompting yields competitive F1 scores (0.776–0.817) for three models, but performance varies by severity and human agreement remains insufficient for fully autonomous use.

0 favorites 0 likes

#clinical-nlp

ReportQA: QA-Based Radiology Report Evaluation

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper proposes ReportQA, a QA-based framework for evaluating radiology reports that uses LLMs to answer clinically relevant questions, demonstrating better alignment with radiologist judgments than existing metrics.

0 favorites 0 likes

#clinical-nlp

A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper presents a computational audit of representational bias in ClinicalBERT, finding that demographic associations are amplified by the model itself rather than inherited from training data.

0 favorites 0 likes

#clinical-nlp

sebis at CRF Filling 2026: A Two-Stage Local LLM Pipeline for Medical CRF Filling

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper presents a fully local, two-stage LLM pipeline using MedGemma-27B for filling Case Report Forms from clinical notes, achieving a macro-F1 of 0.55 on the English test track and securing second place among local open-source submissions.

0 favorites 0 likes

#clinical-nlp

EDEN: A Large-Scale Corpus of Clinical Notes for Italian

arXiv cs.CL ↗ · 2026-06-12 Cached

EDEN is a large-scale corpus of anonymized clinical notes from Italian emergency departments, with a subset manually annotated for structured information extraction. It aims to support LLM development for medical applications in Italian.

0 favorites 0 likes

#clinical-nlp

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

arXiv cs.AI ↗ · 2026-06-10 Cached

This paper demonstrates that supervised fine-tuning with synthetic rationale data consistently harms prediction performance for Alzheimer's disease detection compared to label-only fine-tuning, across many configurations and model families. The degradation persists despite high-quality rationales and is attributed to a conflict between narrative plausibility and discriminative optimization.

0 favorites 0 likes

#clinical-nlp

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

arXiv cs.CL ↗ · 2026-06-02 Cached

This paper presents an iterative imbalance-aware fine-tuning approach using Qwen3-8B with QLoRA for psychological defense mechanism classification, achieving a macro F1 of 0.3917 and ranking 4th out of 21 teams in the PsyDefDetect 2026 shared task.

0 favorites 0 likes

#clinical-nlp

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

arXiv cs.CL ↗ · 2026-05-29 Cached

Introduces SafeRx-Agent, a knowledge-grounded multi-agent framework for safe and explainable medication recommendation that generates fine-grained ATC code predictions while controlling drug interactions and contraindications, evaluated on MIMIC-III and MIMIC-IV datasets.

0 favorites 0 likes

#clinical-nlp

Vectors Are Not Neutral: Sensitive-Information Inference from Exported LLM Representations in Summarization

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper investigates the risk of sensitive information inference from exported LLM representations in clinical summarization, showing that reducing leakage from one vector artifact does not guarantee privacy in others. It introduces SurfaceLoRA, a fine-tuning method that reduces race recovery from targeted vectors while preserving utility.

0 favorites 0 likes

#clinical-nlp

EPPC-OASIS: Ontology-Aware Adaptation and Structured Inference Refinement for Electronic Patient-Provider Communication Mining in Secure Messages

arXiv cs.AI ↗ · 2026-05-26 Cached

This paper introduces EPPC-OASIS, an ontology-aware adaptation method for extracting structured communication behaviors from secure patient-provider messages. The approach combines Wasserstein alignment during fine-tuning with inference refinement procedures, achieving modest improvements over baselines on a de-identified corpus.

0 favorites 0 likes

#clinical-nlp

MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction

arXiv cs.CL ↗ · 2026-05-21 Cached

MedicalBench is a new benchmark for evaluating large language models on medical concept extraction from electronic health records, focusing on implicit reasoning and evidence grounding. It includes 823 expert-annotated examples and shows that current models perform modestly, highlighting the difficulty of extracting implicitly stated medical concepts.

0 favorites 0 likes

#clinical-nlp

Few-Shot Large Language Models for Actionable Triage Categorization of Online Patient Inquiries

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper explores using few-shot prompted LLMs for actionable triage categorization of online patient inquiries into self-care, schedule-visit, urgent-clinician-review, or emergency-referral. The best model (Claude Haiku 4.5 with 12-shot prompting) achieves macro-F1 of 0.475, surpassing supervised baselines, but the authors conclude that LLMs can support triage prioritization and selective human review, not autonomous deployment.

0 favorites 0 likes

#clinical-nlp

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

arXiv cs.CL ↗ · 2026-05-13 Cached

This paper introduces ClinicalBench and the EpiKG system, evaluating assertion-aware retrieval for clinical question answering on MIMIC-IV data across multiple LLMs. It demonstrates that handling negation and temporality in retrieval significantly improves performance over standard baselines.

0 favorites 0 likes

#clinical-nlp

Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper presents a deployment-oriented stress-testing framework to evaluate how well large language models identify side effects of breast cancer radiation treatments. The study highlights limitations in LLM reliability, such as sensitivity to minor documentation changes and under-recall of rare side effects, suggesting that grounding outputs in clinician-curated lists improves robustness.

0 favorites 0 likes

#clinical-nlp

RADS: Reinforcement Learning-Based Sample Selection Improves Transfer Learning in Low-resource and Imbalanced Clinical Settings

arXiv cs.CL ↗ · 2026-04-23 Cached

RADS uses reinforcement learning to pick the most informative samples for few-shot fine-tuning, boosting transfer-learning accuracy on low-resource, highly imbalanced clinical datasets.

0 favorites 0 likes

#clinical-nlp

FD-NL2SQL: Feedback-Driven Clinical NL2SQL that Improves with Use

arXiv cs.CL ↗ · 2026-04-20 Cached

FD-NL2SQL is a feedback-driven natural language to SQL system for clinical oncology databases that improves with use through clinician edits and logic-based SQL augmentation. The system decomposes natural language questions into predicates, retrieves expert-verified exemplars, and synthesizes executable SQL with continuous learning capabilities.

0 favorites 0 likes

clinical-nlp

Submit Feedback