Hallucination Detection-Guided Preference Optimization for Clinical Summarization

arXiv cs.CL Papers

Summary

Introduces HDSR and HDSR-PL, methods that use hallucination detectors to guide iterative self-refinement and preference learning, achieving up to 48% reduction in hallucinations for clinical summarization using Llama and Gemma models on MIMIC-IV-Note.

arXiv:2605.28910v1 Announce Type: new Abstract: Large language models (LLMs) have shown promise on summarization tasks, but they often produce hallucinations, which are unsupported or incorrect statements that limit their reliability in specialized healthcare applications. We introduce \itermodelfull (\itermodel), an inference-time method that leverages hallucination detectors to guide iterative summary revisions toward factual corrections. Building on this, we propose \itermodel for Preference Learning (\model), which converts detector-guided refinement trajectories into preference pairs for model finetuning. Extensive experiments show that our methods substantially reduce hallucinations for Llama and Gemma models in summarizing real-world clinical notes from \MimicIV. For example, \itermodel reduces 24\% and \model reduces 48\% hallucinations in Llama-3.1-8B-Instruct. Importantly, both methods preserve summary fluency, coherence, and relevance according to human expert and LLM-Jury evaluations. Together, these results demonstrate that detection-informed refinement and preference learning offer an automated solution for improving factual faithfulness in clinical summarization.
Original Article
View Cached Full Text

Cached at: 05/29/26, 09:14 AM

# Hallucination Detection-Guided Preference Optimization for Clinical Summarization
Source: [https://arxiv.org/html/2605.28910](https://arxiv.org/html/2605.28910)
Shamanth Kuthpadi Seethakantha1∗Dung Ngoc Thai2∗Vara Prasad Gudi1∗ Simran Tiwari2Rami Matar3Avijit Mitra1 Wenlong Zhao1Wael Salloum2Andrew McCallum4 1,4Manning College of Information and Computer Sciences2Ensemble HP3Columbia College 1\{skuthpadi,vgudi,avijit,wenlongzhao\}@umass\.edu,2\{simran\.tiwari,june\.thai,wael\.salloum\}@ensemblehp\.com 3\{rhm2142\}@columbia\.edu,4\{mccallum\}@cs\.umass\.edu ∗Equal contribution

###### Abstract

Large language models \(LLMs\) have shown promise on summarization tasks, but they often produce hallucinations, which are unsupported or incorrect statements that limit their reliability in specialized healthcare applications\. We introduceHallucination Detection guided Self\-Refinement\(HDSR\), an inference\-time method that leverages hallucination detectors to guide iterative summary revisions toward factual corrections\. Building on this, we proposeHDSRfor Preference Learning \(HDSR\-PL\), which converts detector\-guided refinement trajectories into preference pairs for model finetuning\. Extensive experiments show that our methods substantially reduce hallucinations for Llama and Gemma models in summarizing real\-world clinical notes fromMIMIC\-IV\-Note v2\.2\. For example,HDSRreduces 24% andHDSR\-PLreduces 48% hallucinations in Llama\-3\.1\-8B\-Instruct\. Importantly, both methods preserve summary fluency, coherence, and relevance according to human expert and LLM\-Jury evaluations\. Together, these results demonstrate that detection\-informed refinement and preference learning offer an automated solution for improving factual faithfulness in clinical summarization\.

Hallucination Detection\-Guided Preference Optimization for Clinical Summarization

Shamanth Kuthpadi Seethakantha1∗Dung Ngoc Thai2∗Vara Prasad Gudi1∗Simran Tiwari2Rami Matar3Avijit Mitra1Wenlong Zhao1Wael Salloum2Andrew McCallum41,4Manning College of Information and Computer Sciences2Ensemble HP3Columbia College1\{skuthpadi,vgudi,avijit,wenlongzhao\}@umass\.edu,2\{simran\.tiwari,june\.thai,wael\.salloum\}@ensemblehp\.com3\{rhm2142\}@columbia\.edu,4\{mccallum\}@cs\.umass\.edu∗Equal contribution

## 1Introduction

![Refer to caption](https://arxiv.org/html/2605.28910v1/x1.png)Figure 1:Overview of hallucination mitigation via detection\-informed self\-refinement\.Given an input clinical note, a language model generates an initial summary that may contain unsupported or hallucinated medical content\. A hallucination detector identifies unsupported content, which is used to guide iterative self\-refinement toward removing factual errors rather than stylistic changes \(top;HDSR\)\. The intermediate outputs fromHDSRprocess are converted into preference pairs and used for preference learning \(e\.g\., DPO\), amortizing faithful behavior and yielding hallucination\-mitigated summaries at inference time \(bottom;HDSR\-PL\)\.Large language models achieve strong summarization performance but often generate hallucinations, defined as content not supported by the source or inconsistent with world knowledge\(Maynezet al\.,[2020](https://arxiv.org/html/2605.28910#bib.bib21); Jiet al\.,[2023a](https://arxiv.org/html/2605.28910#bib.bib15); Tanget al\.,[2023](https://arxiv.org/html/2605.28910#bib.bib26)\)\. Hallucinations remain prevalent even in recent LLMs, with rates of roughly 40\-50% on benchmarks such asFaithBench\(Baoet al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib2)\)indicating persistent factual unreliability\. This issue is critical in clinical summarization, where LLMs condense long patient records to support care delivery\. Despite comparable performance with clinician‑written summaries\(Veenet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib28)\), hallucinations manifest as fabricated or distorted clinical statements that are hard to detect\(Asgariet al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib1); Kimet al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib16)\)\. Even domain‑adapted models frequently introduce hallucinations expressed with medically plausible terminology, requiring expert review\(Hegselmannet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib12); Williamset al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib29); Fanget al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib10); Daset al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib6)\), and underscoring the need for stricter factual reliability in clinical summarization than in general‑domain tasks\.

Existing hallucination mitigation strategies are broadly divided into training\-time and inference\-time approaches\. Training\-time methods include domain pretraining, continual pretraining, supervised fine\-tuning, parameter\-efficient adapters\(Veenet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib28); Zaretskyet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib30)\), and specialized loss functions or reinforcement learning objectives\(Fabbriet al\.,[2021](https://arxiv.org/html/2605.28910#bib.bib8); Baoet al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib2); Asgariet al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib1)\)\. These methods require large volumes of high\-quality domain data\(Hegselmannet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib12)\)and depend on factuality metrics that correlate weakly with clinical correctness, limiting their practical impact\. Reinforcement learning from human or AI feedback offers greater flexibility, but requires carefully engineered preference data that is difficult to scale in medical settings\(Leeet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib48)\)\. Inference‑time techniques avoid altering model parameters: retrieval‑augmented generation grounds outputs in external documents yet depends on retrieval quality\(Koopman and Zuccon,[2023](https://arxiv.org/html/2605.28910#bib.bib44); Wanet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib63); Kimet al\.,[2025](https://arxiv.org/html/2605.28910#bib.bib16)\), while self‑refinement\(Madaanet al\.,[2023](https://arxiv.org/html/2605.28910#bib.bib52)\)and verification loops\(Tanget al\.,[2023](https://arxiv.org/html/2605.28910#bib.bib26)\)iteratively critique and revise generations, improving factuality at the cost of higher inference overhead and a risk of over‑editing\. Recent work combining iterative refinement with alignment shows such approaches can strip salient content or introduce new hallucinations even when fluency is preserved\(Jiet al\.,[2023b](https://arxiv.org/html/2605.28910#bib.bib42)\)\. A closely related approach isSynFac\-Edit\(Mishraet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib54)\), which generates synthetic edit feedback for preference optimization but relies on predefined error types and external edit models\.

Our pipeline combines hallucination detection with iterative self refinement, using detector feedback to guide self refinement and to form preference pairs from the original and revised summaries\.

We then train LLMs using direct preference optimization on these preference pairs, producing models that generate more faithful summaries with fewer hallucinations\. On the MIMIC IV clinical summarization hallucination datasets\(Hegselmannet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib12)\), our method reduces hallucinations by approximately 24% through self refinement and by up to 48% after fine tuning, while preserving summary quality and fluency according to both human and LLM\-as\-judge evaluations, and without incurring additional inference time cost\.

## 2Method

Figure[1](https://arxiv.org/html/2605.28910#S1.F1)illustrates our pipeline, which uses hallucination detection feedback to guide summary revision and train models to prefer faithful summaries\.

### 2\.1Problem Statement

Given a source document and a generated summary, the goal of hallucination mitigation is to produce summaries that are informative while remaining faithful to the source\. We view a source document and its summary as describing clinical facts documented during a patient’s hospital course\. A summary is faithful if these facts are supported by the source document\. Hallucinations arise when a summary introduces unsupported facts or adds procedures or findings that were not documented, for example when “leg ultrasound negative” appears in the summary without support from the source\.

### 2\.2Hallucination Detection Guided Self\-Refinement

During self\-refinement, a hallucination detector is applied to the generated summary to identify content that is unsupported or inconsistent with the source document\. Detector feedback highlights unsupported spans or statements, and the model is prompted to revise these parts while preserving supported content \(see Appendix[A\.2](https://arxiv.org/html/2605.28910#A1.SS2)\)\. This process focuses revisions on factual corrections rather than stylistic changes\.

Detection and revision are applied iteratively\. After each revision, the updated summary is re\-evaluated by the detector, and the resulting feedback guides subsequent revisions until a fixed iteration limit is reached or no further hallucinations are detected\. This detector guided refinement improves factual alignment while maintaining overall fluency\.

We use existing hallucination detectors from prior work, specifically MedCat\(Kraljevicet al\.,[2021](https://arxiv.org/html/2605.28910#bib.bib70)\)and prompt based detectors following the MedAlign annotation guidelines\(Fleminget al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib71)\)\. MedCat links clinical concepts in the source document and the summary to biomedical ontologies in order to flag unsupported or missing content\. MedAlign defines a clinically motivated taxonomy of hallucination categories, such as unsupported procedures or medications, which we implement as a prompted detector\.

### 2\.3Preference Learning from Detection Guided Self\-Refinement

Detection\-guided self\-refinement produces pairs of summaries for each source document, consisting of an initial summary and a revised summary with fewer hallucinations\. We treat the revised summary as the preferred output and the initial summary as the non\-preferred output, thereby forming preference pairs without human annotation\. We train the summarization model using direct preference optimization on these pairs\. This objective encourages the model to internalize detector guided factual corrections, amortizing improvements from self refinement into the model parameters and enabling faithful generation at inference time without additional refinement steps\.

## 3Experiments

### 3\.1Experimental Setup

We study clinical summarization from Brief Hospital Course \(BHC\) sections to Discharge Instructions \(DI\) using datasets derived fromMIMIC\-IV\-Note v2\.2\(Hegselmannet al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib12)\)\. Experiments are evaluated on the hallucination\-annotated subsetHallucination\-Generated\-DI, following the task formulation and annotation guidelines ofHegselmannet al\.\([2024](https://arxiv.org/html/2605.28910#bib.bib12)\)\.

We include GPT\-5\(OpenAI,[2025](https://arxiv.org/html/2605.28910#bib.bib73)\)zero\-shot as a closed\-source reference and primarily focus on LLaMA\-3\.1\-8B\-Instruct\(Meta,[2024b](https://arxiv.org/html/2605.28910#bib.bib67)\)as the open\-source base model\. In addition, we report results for two smaller open\-source models, LLaMA\-3\.2\-3B\-Instruct\(Meta,[2024a](https://arxiv.org/html/2605.28910#bib.bib68)\)and Gemma\-3\-4B\-IT\(Google,[2025](https://arxiv.org/html/2605.28910#bib.bib69)\), evaluated under a limited setting\. For LLaMA\-3\.1\-8B\-Instruct, we evaluate zero\-shot generation, supervised fine\-tuning \(SFT\), detection\-informed self\-refinement at inference time \(HDSR\), and detection\-informed preference learning at training time \(HDSR\-PL\)\. Detection\-informed variants use MedCat\(Kraljevicet al\.,[2021](https://arxiv.org/html/2605.28910#bib.bib70)\)and MedAlign\(Fleminget al\.,[2024](https://arxiv.org/html/2605.28910#bib.bib71)\)as hallucination detectors\.

#### Evaluation\.

We report entity\-level hallucination counts and clinician\-based human evaluation on summary quality across four main metrics: Consistency, Coherence, Fluency, and Relevance\. We follow the protocol ofHegselmannet al\.\([2024](https://arxiv.org/html/2605.28910#bib.bib12)\), which includes quantitative fine\-grained hallucination annotations identifying unsupported and contradicted content, as well as qualitative analysis of summary quality across the same dimensions\. Annotations are performed by a team of clinicians blinded to model identity: for the quantitative task, each summary is annotated independently by two annotators \(H1 and H2\) with adjudication of disagreements, while for the qualitative analysis, two annotators score summaries independently without adjudication\.

For LLaMA\-3\.1\-8B\-Instruct, human evaluation is conducted in a blind setting with double annotation and adjudication for hallucination labels, forming the basis of the main results in Table[1](https://arxiv.org/html/2605.28910#S3.T1)\. For LLaMA\-3\.2\-3B\-Instruct and Gemma\-3\-4B\-IT, due to resource constraints, we report clinician\-provided hallucination counts annotated by a single clinician \(H1\) and replace human qualitative evaluation with automatic LLM\-as\-judge scores\.

### 3\.2Main Results

Model / MethodHallucinationCount↓\\downarrowSummary Quality MetricsConsistency↑\\uparrowCoherence↑\\uparrowFluency↑\\uparrowRelevance↑\\uparrowAverage↑\\uparrowGPT\-5Prompting363\.554\.734\.734\.084\.27LLaMA\-3\.1\-8B\-InstructPrompting294\.083\.834\.053\.233\.79SFT573\.034\.434\.533\.033\.75HDSR\(best; MedAlign\)224\.134\.484\.533\.954\.27HDSR\-PL\(best; MedCat\)154\.404\.284\.053\.904\.16LLaMA\-3\.2\-3B\-InstructPrompting264\.134\.584\.684\.204\.39HDSR\-PL\(best; MedCat\)134\.234\.584\.684\.304\.44Gemma\-3\-4b\-itPrompting154\.234\.654\.684\.484\.51HDSR\-PL\(best; MedCat\)134\.334\.654\.704\.634\.58Table 1:Results onHallucination\-Generated\-DI\.Both variants of our detector\-guided approach outperform the zero\-shot and SFT baselines\. Note that for LLama\-3\.2\-3B\-Instruct and Gemma\-3\-4b\-it, we use automatic LLM\-as\-Judge for summary quality metrics\.Table 2:Distribution of hallucination error types across models\.ModelUnsup\. Condition Unsup\. Procedure Unsup\. Medication Unsup\. Time Unsup\. Location Unsup\. Number Unsup\. Name Unsup\. Word Unsup\. Other Contradicted Fact Incorrect Fact *LLaMA\-3\.1\-8B\-Instruct*Prompting82134012080SFT1511836033170Self\-Refine \(No Det\.\)102623311120HDSR\(w/MedCat\)71343222010HDSR\(w/MedAlign\)42133110250HDSR\-PL\(best; MedCat\)31012000080

Table 3:Impact of detection signal for hallucination mitigation\.Comparison between self\-refinement with and without detectors on LLaMA\-3\.1\-8B\.Model / MethodHallucinationCount↓\\downarrowSummary Quality MetricsConsistency↑\\uparrowCoherence↑\\uparrowFluency↑\\uparrowRelevance↑\\uparrowSelf\-Refine \(No Det\.\)313\.854\.254\.584\.20HDSR\(w/MedCat\)254\.084\.554\.584\.13HDSR\(w/MedAlign\)224\.134\.484\.533\.95

Table[1](https://arxiv.org/html/2605.28910#S3.T1)presents the main results on theHallucination\-Generated\-DIbenchmark, comparing zero\-shot generation, supervised fine\-tuning \(SFT\),HDSR, andHDSR\-PLusing LLaMA\-3\.1\-8B\-Instruct\. We also report limited results for LLaMA\-3\.2\-3B\-Instruct and Gemma\-3\-4B\-IT\. The following analysis focuses on LLaMA\-3\.1\.

#### SFT amplifies hallucinations\.

Supervised fine\-tuning on clinician\-written references substantially worsens factual alignment and increases hallucinations \(57\) to nearly twice the zero\-shot baseline\. Although supervised fine\-tuning improves fluency \(4\.53\) and coherence \(4\.43\), consistency drops sharply \(3\.03\)\.

#### HDSRimproves summary quality and mitigates hallucinations\.

OurHDSRyields a substantial improvement over both baselines\. Using MedAlign as the detector,HDSRreduces hallucinations to 22 while simultaneously achieving the best overall human\-evaluated summary quality across all dimensions\.

#### HDSR\-PLfurther mitigates hallucinations\.

HDSR\-PL, using MedCat\-derived preference pairs, achieves the lowest hallucination count overall \(15\), corresponding to an approximate 48% reduction relative to the zero\-shot baseline and a 74% reduction relative to SFT\.HDSR\-PLslightly underperformsHDSRon fluency and coherence\. This trade\-off reflects the difference between the two variants of our pipeline:HDSRbenefits from direct, entity\-specific detector feedback, whereasHDSR\-PLlearns a more general preference for factual alignment that may smooth over fine\-grained stylistic details but yields factually aligned outputs\.

#### Analyzing the types of hallucinations\.

Table[3](https://arxiv.org/html/2605.28910#S3.T3)breaks down hallucinations by error type for LLaMA\-3\.1\-8B, revealing that unsupported conditions, procedures, and medications account for the majority of errors in zero\-shot and supervised fine\-tuned models\. SFT amplifies these clinically critical error types, increasing unsupported conditions from 8 to 15 and procedures from 2 to 11\. Self\-refinement without detection reduces some surface\-level errors but leaves substantial unsupported content uncorrected\. In contrast,HDSRvariants consistently reduce errors across multiple categories, particularly unsupported conditions and procedures, indicating that the detector guides corrections toward factually grounded revisions\.

#### Impact of detector on hallucination mitigation\.

Table[3](https://arxiv.org/html/2605.28910#S3.T3)shows that incorporating an explicit detection signal reduces hallucination count forHDSRby 29% when using MedAlign as the hallucination detector and 19% when using MedCat\.

## 4Conclusion

We presented two complementary approaches for mitigating hallucinations in clinical summarization using automatic hallucination detection as supervision\.HDSRis an inference\-time method that integrates detector feedback into iterative self\-refinement, focusing revisions on unsupported or contradicted medical content while preserving grounded information\.HDSR\-PLextends this framework to training time by transforming refinement trajectories into preference pairs, enabling direct preference optimization that amortizes factual alignment\.

## Limitations

Our work highlights several areas for further investigation\. First, bothHDSRandHDSR\-PLrely on the quality of the hallucination detector used to provide supervision\. While we treat detectors as black\-box assessors, their errors–such as false positives that signal supported content or false negatives that miss subtle inconsistencies–can limit the effectiveness of refinement and downstream preference learning\. Improving detector accuracy or combining multiple complementary detectors may further enhance performance\.

In addition, our evaluation focuses on clinical summarization tasks derived fromMIMIC\-IV\-Note v2\.2, specifically BHC to Discharge Instructions generation\. Although this is a high\-impact and clinically relevant setting, the transferability of our approach to other medical note types or non\-clinical summarization domains remains to be explored\.

Finally, inference\-time self\-refinement \(HDSR\) incurs additional computational cost due to multiple refinement iterations and detector calls, which may limit its applicability in latency\-sensitive settings\. WhileHDSR\-PLaddresses this by amortizing refinement into training, it requires additional training resources and preference optimization, which may not be feasible in all deployment scenarios\.

## References

- A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation\.npj Digital Medicine8\(1\),pp\. 274\.External Links:ISSN 2398\-6352,[Document](https://dx.doi.org/10.1038/s41746-025-01670-7)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1),[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- F\. S\. Bao, M\. Li, R\. Qu, G\. Luo, E\. Wan, Y\. Tang, W\. Fan, M\. S\. Tamber, S\. Kazi, V\. Sourabh, M\. Qi, R\. Tu, C\. Xu, M\. Gonzales, O\. Mendelevitch, and A\. Ahmad \(2025\)FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs\.InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 2: Short Papers\),L\. Chiruzzo, A\. Ritter, and L\. Wang \(Eds\.\),Albuquerque, New Mexico,pp\. 448–461\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.naacl-short.38),ISBN 979\-8\-89176\-190\-2Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1),[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- A\. B\. Das, S\. Ahmed, and S\. K\. Sakib \(2025\)Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open\-Source Large Language Models\.arXiv\.External Links:2504\.19061,[Document](https://dx.doi.org/10.48550/arXiv.2504.19061)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1)\.
- A\. R\. Fabbri, W\. Kryściński, B\. McCann, C\. Xiong, R\. Socher, and D\. Radev \(2021\)SummEval: Re\-evaluating Summarization Evaluation\.arXiv\.External Links:2007\.12626,[Document](https://dx.doi.org/10.48550/arXiv.2007.12626)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- B\. Fang, X\. Dai, and S\. Karimi \(2024\)Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries\.InFindings of the Association for Computational Linguistics: EMNLP 2024,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 9890–9911\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.findings-emnlp.578)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1)\.
- S\. L\. Fleming, A\. Lozano, W\. J\. Haberkorn, J\. A\. Jindal, E\. Reis, R\. Thapa, L\. Blankemeier, J\. Z\. Genkins, E\. Steinberg, A\. Nayak, B\. S\. Patel, C\. Chiang, A\. Callahan, Z\. Huo, S\. Gatidis, S\. J\. Adams, O\. Fayanju, S\. J\. Shah, T\. Savage, E\. Goh, A\. S\. Chaudhari, N\. Aghaeepour, C\. D\. Sharp, M\. A\. Pfeffer, P\. Liang, J\. H\. Chen, K\. E\. Morse, E\. P\. Brunskill, J\. A\. Fries, and N\. H\. Shah \(2024\)MedAlign: A clinician\-generated dataset for instruction following with electronic medical records\.InThirty\-Eighth AAAI Conference on Artificial Intelligence,External Links:[Link](https://doi.org/10.1609/aaai.v38i20.30205),[Document](https://dx.doi.org/10.1609/AAAI.V38I20.30205)Cited by:[§2\.2](https://arxiv.org/html/2605.28910#S2.SS2.p3.1),[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.p2.1)\.
- Google \(2025\)Google/gemma\-3\-4b\-it\.Note:Online; accessed 2026\-01\-05External Links:[Link](https://huggingface.co/google/gemma-3-4b-it)Cited by:[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.p2.1)\.
- S\. Hegselmann, S\. Z\. Shen, F\. Gierse, M\. Agrawal, D\. Sontag, and X\. Jiang \(2024\)A Data\-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models\.arXiv\.External Links:2402\.15422,[Document](https://dx.doi.org/10.48550/arXiv.2402.15422)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1),[§1](https://arxiv.org/html/2605.28910#S1.p2.1),[§1](https://arxiv.org/html/2605.28910#S1.p4.1),[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.SSS0.Px1.p1.1),[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.p1.1)\.
- Z\. Ji, N\. Lee, R\. Frieske, T\. Yu, D\. Su, Y\. Xu, E\. Ishii, Y\. Bang, D\. Chen, W\. Dai, H\. S\. Chan, A\. Madotto, and P\. Fung \(2023a\)Survey of Hallucination in Natural Language Generation\.ACM Computing Surveys55\(12\),pp\. 1–38\.External Links:2202\.03629,ISSN 0360\-0300, 1557\-7341,[Document](https://dx.doi.org/10.1145/3571730)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1)\.
- Z\. Ji, T\. Yu, Y\. Xu, N\. Lee, E\. Ishii, and P\. Fung \(2023b\)Towards Mitigating Hallucination in Large Language Models via Self\-Reflection\.arXiv\.External Links:2310\.06271,[Document](https://dx.doi.org/10.48550/arXiv.2310.06271)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- Y\. Kim, H\. Jeong, S\. Chen, S\. S\. Li, M\. Lu, K\. Alhamoud, J\. Mun, C\. Grau, M\. Jung, R\. Gameiro, L\. Fan, E\. Park, T\. Lin, J\. Yoon, W\. Yoon, M\. Sap, Y\. Tsvetkov, P\. Liang, X\. Xu, X\. Liu, D\. McDuff, H\. Lee, H\. W\. Park, S\. Tulebaev, and C\. Breazeal \(2025\)Medical Hallucinations in Foundation Models and Their Impact on Healthcare\.arXiv\.External Links:2503\.05777,[Document](https://dx.doi.org/10.48550/arXiv.2503.05777)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1),[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- B\. Koopman and G\. Zuccon \(2023\)Dr ChatGPT tell me what I want to hear: How different prompts impact health answer correctness\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 15012–15022\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.928)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- Z\. Kraljevic, T\. Searle, A\. Shek, L\. Roguski, K\. Noor, D\. Bean, A\. Mascio, L\. Zhu, A\. A\. Folarin, A\. Roberts, R\. Bendayan, M\. P\. Richardson, R\. Stewart, A\. D\. Shah, W\. K\. Wong, Z\. Ibrahim, J\. T\. Teo, and R\. J\. B\. Dobson \(2021\)Multi\-domain clinical natural language processing with MedCAT: the medical concept annotation toolkit\.Artif\. Intell\. Med\.117,pp\. 102083\.External Links:ISSN 0933\-3657,[Document](https://dx.doi.org/10.1016/j.artmed.2021.102083)Cited by:[§2\.2](https://arxiv.org/html/2605.28910#S2.SS2.p3.1),[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.p2.1)\.
- H\. Lee, S\. Phatale, H\. Mansoor, T\. Mesnard, J\. Ferret, K\. Lu, C\. Bishop, E\. Hall, V\. Carbune, A\. Rastogi, and S\. Prakash \(2024\)RLAIF vs\. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback\.arXiv\.External Links:2309\.00267,[Document](https://dx.doi.org/10.48550/arXiv.2309.00267)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- A\. Madaan, N\. Tandon, P\. Gupta, S\. Hallinan, L\. Gao, S\. Wiegreffe, U\. Alon, N\. Dziri, S\. Prabhumoye, Y\. Yang, S\. Gupta, B\. P\. Majumder, K\. Hermann, S\. Welleck, A\. Yazdanbakhsh, and P\. Clark \(2023\)Self\-Refine: Iterative Refinement with Self\-Feedback\.arXiv\.External Links:2303\.17651,[Document](https://dx.doi.org/10.48550/arXiv.2303.17651)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- J\. Maynez, S\. Narayan, B\. Bohnet, and R\. McDonald \(2020\)On Faithfulness and Factuality in Abstractive Summarization\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 1906–1919\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.173)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1)\.
- Meta \(2024a\)Meta\-llama/llama\-3\.2\-3b\-instruct\.Note:Online; accessed 2026\-01\-05External Links:[Link](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)Cited by:[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.p2.1)\.
- Meta \(2024b\)Meta‑llama/llama‑3\.1‑8b‑instruct\.Note:Online; accessed 2025‑02‑21External Links:[Link](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)Cited by:[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.p2.1)\.
- P\. Mishra, Z\. Yao, P\. Vashisht, F\. Ouyang, B\. Wang, V\. D\. Mody, and H\. Yu \(2024\)SYNFAC\-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization\.arXiv\.External Links:2402\.13919,[Document](https://dx.doi.org/10.48550/arXiv.2402.13919)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- OpenAI \(2025\)GPT\-5 system card\.Note:[https://cdn\.openai\.com/gpt\-5\-system\-card\.pdf](https://cdn.openai.com/gpt-5-system-card.pdf)Official technical report describing the GPT\-5 architecture and capabilitiesCited by:[§3\.1](https://arxiv.org/html/2605.28910#S3.SS1.p2.1)\.
- L\. Tang, Z\. Sun, B\. Idnay, J\. G\. Nestor, A\. Soroush, P\. A\. Elias, Z\. Xu, Y\. Ding, G\. Durrett, J\. F\. Rousseau, C\. Weng, and Y\. Peng \(2023\)Evaluating large language models on medical evidence summarization\.npj Digital Medicine6\(1\),pp\. 158\.External Links:ISSN 2398\-6352,[Document](https://dx.doi.org/10.1038/s41746-023-00896-7)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1),[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- D\. V\. Veen, C\. V\. Uden, L\. Blankemeier, J\. Delbrouck, A\. Aali, C\. Bluethgen, A\. Pareek, M\. Polacin, E\. P\. Reis, A\. Seehofnerova, N\. Rohatgi, P\. Hosamani, W\. Collins, N\. Ahuja, C\. P\. Langlotz, J\. Hom, S\. Gatidis, J\. Pauly, and A\. S\. Chaudhari \(2024\)Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization\.Nature Medicine30\(4\),pp\. 1134–1142\.External Links:2309\.07430,ISSN 1078\-8956, 1546\-170X,[Document](https://dx.doi.org/10.1038/s41591-024-02855-5)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1),[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- A\. Wan, E\. Wallace, and D\. Klein \(2024\)What Evidence Do Language Models Find Convincing?\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 7468–7484\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.403)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.
- C\. Y\. K\. Williams, J\. Bains, T\. Tang, K\. Patel, A\. N\. Lucas, F\. Chen, B\. Y\. Miao, A\. J\. Butte, and A\. E\. Kornblith \(2025\)Evaluating large language models for drafting emergency department encounter summaries\.PLOS Digital Health4\(6\),pp\. e0000899\.External Links:ISSN 2767\-3170,[Document](https://dx.doi.org/10.1371/journal.pdig.0000899)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p1.1)\.
- J\. Zaretsky, J\. M\. Kim, S\. Baskharoun, Y\. Zhao, J\. Austrian, Y\. Aphinyanaphongs, R\. Gupta, S\. B\. Blecker, and J\. Feldman \(2024\)Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient\-Friendly Language and Format\.JAMA network open7\(3\),pp\. e240357\.External Links:ISSN 2574\-3805,[Document](https://dx.doi.org/10.1001/jamanetworkopen.2024.0357)Cited by:[§1](https://arxiv.org/html/2605.28910#S1.p2.1)\.

## Appendix AAdditional Details

### A\.1Summarization Prompt

Summarization Prompt[⬇](data:text/plain;base64,PHxzeXN0ZW18PgpZb3UgYXJlIGEgaGVscGZ1bCBhc3Npc3RhbnQgdGhhdCBoZWxwcyBwYXRpZW50cyB1bmRlcnN0YW5kIHRoZWlyIG1lZGljYWwgcmVjb3Jkcy4KPHxlbmR8PgoKPHx1c2VyfD4KWW91IHdpbGwgYmUgZ2l2ZW4gc29tZSBkb2N0b3IncyBub3RlcyBhbmQgeW91IHdpbGwgbmVlZCB0byBzdW1tYXJpemUgdGhlIHBhdGllbnQncyBicmllZiBob3NwaXRhbCBjb3Vyc2UgaW4gb25lIHBhcmFncmFwaC4gUGxlYXNlIG9ubHkgaW5jbHVkZSBrZXkgZXZlbnRzIGFuZCBmaW5kaW5ncyBhbmQgYXZvaWQgdXNpbmcgbWVkaWNhbCBqYXJnb25zLCBhbmQgeW91IE1VU1Qgc3RhcnQgdGhlIHN1bW1hcnkgd2l0aCAiWW91IHdlcmUgYWRtaXR0ZWQiLgp7JSBpZiBkZW1vbnN0cmF0aW9ucyAlfQpIZXJlIGFyZSBzb21lIGV4YW1wbGVzOgoKeyUgZm9yIGV4YW1wbGUgaW4gZGVtb25zdHJhdGlvbnMgJX0KRE9DVU1FTlQ6IAp7eyBleGFtcGxlLnRleHQgfX0KClNVTU1BUlk6IAp7eyBleGFtcGxlLnN1bW1hcnkgfX0KCnslIGVuZGZvciAlfQp7JSBlbmRpZiAlfQpET0NVTUVOVDoge3sgY29udGV4dCB9fQo8fGVuZHw+)<\|system\|\>Youareahelpfulassistantthathelpspatientsunderstandtheirmedicalrecords\.<\|end\|\><\|user\|\>Youwillbegivensomedoctor’snotesandyouwillneedtosummarizethepatient’sbriefhospitalcourseinoneparagraph\.Pleaseonlyincludekeyeventsandfindingsandavoidusingmedicaljargons,andyouMUSTstartthesummarywith"Youwereadmitted"\.\{%ifdemonstrations%\}Herearesomeexamples:\{%forexampleindemonstrations%\}DOCUMENT:\{\{example\.text\}\}SUMMARY:\{\{example\.summary\}\}\{%endfor%\}\{%endif%\}DOCUMENT:\{\{context\}\}<\|end\|\>

### A\.2Self\-refinement Prompts

Self\-refinement revision without detectors Prompt[⬇](data:text/plain;base64,PHxzeXN0ZW18PgpZb3UgYXJlIGEgY2xpbmljYWwgc3VtbWFyaXphdGlvbiBhc3Npc3RhbnQuCjx8ZW5kfD4KCjx8dXNlcnw+CllvdSB3aWxsIGJlIGdpdmVuOgotIEEgYnJpZWYgaG9zcGl0YWwgY291cnNlIChTT1VSQ0UgRkFDVFMpCi0gQSBEUkFGVCBTVU1NQVJZIHRoYXQgY29udGFpbnMgcG90ZW50aWFsIGlzc3Vlcy4gIAoKWW91ciB0YXNrIGlzIHRvIFJFVklTRSB0aGUgRFJBRlQgU1VNTUFSWSBzbyB0aGF0IGl0IGlzIGZ1bGx5IGNvbnNpc3RlbnQgd2l0aCB0aGUgU09VUkNFIEZBQ1RTIGFuZCBzdWl0YWJsZSBhcyBhbiBhY2N1cmF0ZSBhZnRlci12aXNpdCBzdW1tYXJ5LgoKSW5zdHJ1Y3Rpb25zOgoxLiBUcmVhdCB0aGUgU09VUkNFIEZBQ1RTIGFzIGF1dGhvcml0YXRpdmUuCjIuIENoZWNrIGZvciBtaXNzaW5nIG9yIGluY29tcGxldGUgaW5mb3JtYXRpb24gaW4gdGhlIERSQUZUIFNVTU1BUlkgY29tcGFyZWQgdG8gdGhlIFNPVVJDRSBGQUNUUywgZXNwZWNpYWxseToKICAgLSBDaGllZiBjb21wbGFpbnQgLyByZWFzb24gZm9yIHZpc2l0ICAKICAgLSBQcmVzZW50aW5nIHN5bXB0b21zICAKICAgLSBQcm9jZWR1cmVzIHBlcmZvcm1lZCAgCiAgIC0gTWVkaWNhdGlvbnMgKG5ldywgY2hhbmdlZCwgb3IgZGlzY29udGludWVkKSAgCiAgIC0gVml0YWwgc2lnbnMgIAogICAtIEtleSBsYWJvcmF0b3J5IG9yIGltYWdpbmcgZmluZGluZ3MKNC4gRG8gKipub3QqKiBpbnZlbnQgb3IgaW5mZXIgbmV3IGRpYWdub3NlcywgbWVkaWNhdGlvbnMsIHByb2NlZHVyZXMsIG9yIGRhdGVzLgo1LiBLZWVwIHByb2Zlc3Npb25hbCB0b25lIGFuZCBzdHJ1Y3R1cmUgc3VpdGFibGUgZm9yIGEgZGlzY2hhcmdlIG9yIGFmdGVyLXZpc2l0IHN1bW1hcnkuCjYuIFByZWZlciB0ZXJtcyBhbmQgdmFsdWVzIGV4YWN0bHkgYXMgc3RhdGVkIGluIHRoZSBTT1VSQ0UgRkFDVFMuCjcuIElmIHRoZSBEUkFGVCBTVU1NQVJZIGlzIG1pc3NpbmcgY2xpbmljYWxseSBpbXBvcnRhbnQgaW5mbyB0aGF0IElTIHByZXNlbnQgaW4gdGhlIFNPVVJDRSBGQUNUUyAoZS5nLiBhIGtleSBwcm9jZWR1cmUgb3IgZGlzY2hhcmdlIG1lZGljYXRpb24pLCBBREQgaXQuCjguIEFsd2F5cyBzdGFydCB0aGUgc3VtbWFyeSB3aXRoICJZb3Ugd2VyZSBhZG1pdHRlZCIgYW5kIHJlZmVyIHRvIHRoZSBwYXRpZW50IGFzIHlvdSAvIHlvdXIuCjkuIFJldHVybiBvbmx5IHRoZSByZXZpc2VkIHN1bW1hcnkgdGV4dCAobm8gZXhwbGFuYXRpb25zIG9yIG1hcmt1cCkuCgotLS0KW1NPVVJDRSBGQUNUUyAvIEJSSUVGIEhPU1BJVEFMIENPVVJTRV0Ke3sgY29udGV4dCB9fQoKW0RSQUZUIFNVTU1BUlkgV0lUSCBGRUVEQkFDSyBBTk5PVEFUSU9OU10Ke3sgc3VtbWFyeSB9fQo8fGVuZHw+)<\|system\|\>Youareaclinicalsummarizationassistant\.<\|end\|\><\|user\|\>Youwillbegiven:\-Abriefhospitalcourse\(SOURCEFACTS\)\-ADRAFTSUMMARYthatcontainspotentialissues\.YourtaskistoREVISEtheDRAFTSUMMARYsothatitisfullyconsistentwiththeSOURCEFACTSandsuitableasanaccurateafter\-visitsummary\.Instructions:1\.TreattheSOURCEFACTSasauthoritative\.2\.CheckformissingorincompleteinformationintheDRAFTSUMMARYcomparedtotheSOURCEFACTS,especially:\-Chiefcomplaint/reasonforvisit\-Presentingsymptoms\-Proceduresperformed\-Medications\(new,changed,ordiscontinued\)\-Vitalsigns\-Keylaboratoryorimagingfindings4\.Do\*\*not\*\*inventorinfernewdiagnoses,medications,procedures,ordates\.5\.Keepprofessionaltoneandstructuresuitableforadischargeorafter\-visitsummary\.6\.PrefertermsandvaluesexactlyasstatedintheSOURCEFACTS\.7\.IftheDRAFTSUMMARYismissingclinicallyimportantinfothatISpresentintheSOURCEFACTS\(e\.g\.akeyprocedureordischargemedication\),ADDit\.8\.Alwaysstartthesummarywith"Youwereadmitted"andrefertothepatientasyou/your\.9\.Returnonlytherevisedsummarytext\(noexplanationsormarkup\)\.\-\-\-\[SOURCEFACTS/BRIEFHOSPITALCOURSE\]\{\{context\}\}\[DRAFTSUMMARYWITHFEEDBACKANNOTATIONS\]\{\{summary\}\}<\|end\|\>

Self\-refinement revision with detectors Prompt[⬇](data:text/plain;base64,PHxzeXN0ZW18PgpZb3UgYXJlIGEgY2xpbmljYWwgc3VtbWFyaXphdGlvbiBhc3Npc3RhbnQuCjx8ZW5kfD4KCjx8dXNlcnw+CllvdSB3aWxsIGJlIGdpdmVuOgotIEEgYnJpZWYgaG9zcGl0YWwgY291cnNlIChTT1VSQ0UgRkFDVFMpCi0gQSBEUkFGVCBTVU1NQVJZIHRoYXQgY29udGFpbnMgZmVlZGJhY2sgYW5ub3RhdGlvbnMgbWFya2luZyBwb3RlbnRpYWwgaXNzdWVzLiAgCiAgUG90ZW50aWFsbHkgaW5jb3JyZWN0IG9yIHVuc3VwcG9ydGVkIHRleHQgaXMgZW5jbG9zZWQgaW4gPGVycm9yPiAuLi4gPC9lcnJvcj4gdGFncy4KCllvdXIgdGFzayBpcyB0byBSRVZJU0UgdGhlIERSQUZUIFNVTU1BUlkgc28gdGhhdCBpdCBpcyBmdWxseSBjb25zaXN0ZW50IHdpdGggdGhlIFNPVVJDRSBGQUNUUyBhbmQgc3VpdGFibGUgYXMgYW4gYWNjdXJhdGUgYWZ0ZXItdmlzaXQgc3VtbWFyeS4KCkluc3RydWN0aW9uczoKMS4gVHJlYXQgdGhlIFNPVVJDRSBGQUNUUyBhcyBhdXRob3JpdGF0aXZlLgoyLiBGb3IgZWFjaCBzZWdtZW50IHdyYXBwZWQgaW4gPGVycm9yPiAuLi4gPC9lcnJvcj46CiAgIC0gSWYgdGhlIGNvbnRlbnQgaXMgdW5zdXBwb3J0ZWQgb3IgY29udHJhZGljdGVkIGJ5IHRoZSBTT1VSQ0UgRkFDVFMgLT4gcmVtb3ZlIG9yIGNvcnJlY3QgaXQuCiAgIC0gSWYgdGhlIGNvbnRlbnQgaXMgcGFydGlhbGx5IGNvcnJlY3QgLT4gcmV3cml0ZSBpdCB1c2luZyBpbmZvcm1hdGlvbiBmcm9tIHRoZSBTT1VSQ0UgRkFDVFMuCiAgIC0gSWYgdGhlIGNvbnRlbnQgaXMgYWNjdXJhdGUgLT4ga2VlcCBpdCwgYnV0IHJlbW92ZSB0aGUgPGVycm9yPiB0YWdzLgozLiBDaGVjayBmb3IgbWlzc2luZyBvciBpbmNvbXBsZXRlIGluZm9ybWF0aW9uIGluIHRoZSBEUkFGVCBTVU1NQVJZIGNvbXBhcmVkIHRvIHRoZSBTT1VSQ0UgRkFDVFMsIGVzcGVjaWFsbHk6CiAgIC0gQ2hpZWYgY29tcGxhaW50IC8gcmVhc29uIGZvciB2aXNpdCAgCiAgIC0gUHJlc2VudGluZyBzeW1wdG9tcyAgCiAgIC0gUHJvY2VkdXJlcyBwZXJmb3JtZWQgIAogICAtIE1lZGljYXRpb25zIChuZXcsIGNoYW5nZWQsIG9yIGRpc2NvbnRpbnVlZCkgIAogICAtIFZpdGFsIHNpZ25zICAKICAgLSBLZXkgbGFib3JhdG9yeSBvciBpbWFnaW5nIGZpbmRpbmdzCjQuIERvICoqbm90KiogaW52ZW50IG9yIGluZmVyIG5ldyBkaWFnbm9zZXMsIG1lZGljYXRpb25zLCBwcm9jZWR1cmVzLCBvciBkYXRlcy4KNS4gS2VlcCBwcm9mZXNzaW9uYWwgdG9uZSBhbmQgc3RydWN0dXJlIHN1aXRhYmxlIGZvciBhIGRpc2NoYXJnZSBvciBhZnRlci12aXNpdCBzdW1tYXJ5Lgo2LiBQcmVmZXIgdGVybXMgYW5kIHZhbHVlcyBleGFjdGx5IGFzIHN0YXRlZCBpbiB0aGUgU09VUkNFIEZBQ1RTLgo3LiBJZiB0aGUgRFJBRlQgU1VNTUFSWSBpcyBtaXNzaW5nIGNsaW5pY2FsbHkgaW1wb3J0YW50IGluZm8gdGhhdCBJUyBwcmVzZW50IGluIHRoZSBTT1VSQ0UgRkFDVFMgKGUuZy4gYSBrZXkgcHJvY2VkdXJlIG9yIGRpc2NoYXJnZSBtZWRpY2F0aW9uKSwgQUREIGl0Lgo4LiBBbHdheXMgc3RhcnQgdGhlIHN1bW1hcnkgd2l0aCAiWW91IHdlcmUgYWRtaXR0ZWQiIGFuZCByZWZlciB0byB0aGUgcGF0aWVudCBhcyB5b3UgLyB5b3VyLgo5LiBSZXR1cm4gb25seSB0aGUgcmV2aXNlZCBzdW1tYXJ5IHRleHQgKG5vIGV4cGxhbmF0aW9ucyBvciBtYXJrdXApLgoKLS0tCltTT1VSQ0UgRkFDVFMgLyBCUklFRiBIT1NQSVRBTCBDT1VSU0VdCnt7IGNvbnRleHQgfX0KCltEUkFGVCBTVU1NQVJZIFdJVEggRkVFREJBQ0sgQU5OT1RBVElPTlNdCnt7IHN1bW1hcnlfd2l0aF9lcnJvcnMgfX0KPHxlbmR8Pg==)<\|system\|\>Youareaclinicalsummarizationassistant\.<\|end\|\><\|user\|\>Youwillbegiven:\-Abriefhospitalcourse\(SOURCEFACTS\)\-ADRAFTSUMMARYthatcontainsfeedbackannotationsmarkingpotentialissues\.Potentiallyincorrectorunsupportedtextisenclosedin<error\>\.\.\.</error\>tags\.YourtaskistoREVISEtheDRAFTSUMMARYsothatitisfullyconsistentwiththeSOURCEFACTSandsuitableasanaccurateafter\-visitsummary\.Instructions:1\.TreattheSOURCEFACTSasauthoritative\.2\.Foreachsegmentwrappedin<error\>\.\.\.</error\>:\-IfthecontentisunsupportedorcontradictedbytheSOURCEFACTS\-\>removeorcorrectit\.\-Ifthecontentispartiallycorrect\-\>rewriteitusinginformationfromtheSOURCEFACTS\.\-Ifthecontentisaccurate\-\>keepit,butremovethe<error\>tags\.3\.CheckformissingorincompleteinformationintheDRAFTSUMMARYcomparedtotheSOURCEFACTS,especially:\-Chiefcomplaint/reasonforvisit\-Presentingsymptoms\-Proceduresperformed\-Medications\(new,changed,ordiscontinued\)\-Vitalsigns\-Keylaboratoryorimagingfindings4\.Do\*\*not\*\*inventorinfernewdiagnoses,medications,procedures,ordates\.5\.Keepprofessionaltoneandstructuresuitableforadischargeorafter\-visitsummary\.6\.PrefertermsandvaluesexactlyasstatedintheSOURCEFACTS\.7\.IftheDRAFTSUMMARYismissingclinicallyimportantinfothatISpresentintheSOURCEFACTS\(e\.g\.akeyprocedureordischargemedication\),ADDit\.8\.Alwaysstartthesummarywith"Youwereadmitted"andrefertothepatientasyou/your\.9\.Returnonlytherevisedsummarytext\(noexplanationsormarkup\)\.\-\-\-\[SOURCEFACTS/BRIEFHOSPITALCOURSE\]\{\{context\}\}\[DRAFTSUMMARYWITHFEEDBACKANNOTATIONS\]\{\{summary\_with\_errors\}\}<\|end\|\>

### A\.3Hallucination Detection

Medalign zero\-shot Prompt[⬇](data:text/plain;base64,PHxzeXN0ZW18PgpZb3UgYXJlIGEgaGVscGZ1bCBhc3Npc3RhbnQgdGhhdCBoZWxwcyBwYXRpZW50cyB1bmRlcnN0YW5kIHRoZWlyIG1lZGljYWwgcmVjb3Jkcy4KPHxlbmR8PgoKPHx1c2VyfD4KWW91IHdpbGwgYmUgZ2l2ZW4gYSBkb2N0b3IncyBub3RlcyBhbmQgYSBzdW1tYXJ5IHdpdGggcG90ZW50aWFsbHkgaW5jb3JyZWN0bmVzcy4gWW91ciB0YXNrIGlzIHRvIGlkZW50aWZ5IHNwYW5zIHdpdGggZXJyb25lb3VzLCBjb250cmFkaWN0b3J5LCBvciB1bnN1cHBvcnRlZCBmYWN0cyBpbiB0aGUgc3VtbWFyeSwgYW5kIGxhYmVsIHRoZW0gdXNpbmcgdGhlIDxlcnJvcj4gdGFnIChlLmcuIDxlcnJvcj5pbmNvcnJlY3QgZmFjdDwvZXJyb3I+KS4gVGhlcmUgY291bGQgYmUgbW9yZSB0aGFuIG9uZSBlcnJvciBpbiB0aGUgc3VtbWFyeS4gCnslIGlmIGRlbW9uc3RyYXRpb25zICV9CkhlcmUgYXJlIHNvbWUgZXhhbXBsZXM6Cgp7JSBmb3IgZXhhbXBsZSBpbiBkZW1vbnN0cmF0aW9ucyAlfQpET0NVTUVOVDogCnt7IGV4YW1wbGUuY29udGV4dCB9fQoKT1JJR0lOQUwgU1VNTUFSWTogCnt7IGV4YW1wbGUuc3VtbWFyeSB9fQoKU1VNTUFSWSBXSVRIIExBQkVMRUQgRVJST1JTOgp7eyBleGFtcGxlLnN1bW1hcnlfd2l0aF9lcnJvcnMgfX0KCnslIGVuZGZvciAlfQp7JSBlbmRpZiAlfQpDYW4geW91IGlkZW50aWZ5IHRoZSBlcnJvcnMgZm9yIHRoZSBmb2xsb3dpbmcgZG9jdW1lbnQgYW5kIHN1bW1hcnk/CkRPQ1VNRU5UOiAKe3sgY29udGV4dCB9fQoKT1JJR0lOQUwgU1VNTUFSWTogCnt7IHN1bW1hcnkgfX0KClNVTU1BUlkgV0lUSCBMQUJFTEVEIEVSUk9SUzoKPHxlbmR8Pg==)<\|system\|\>Youareahelpfulassistantthathelpspatientsunderstandtheirmedicalrecords\.<\|end\|\><\|user\|\>Youwillbegivenadoctor’snotesandasummarywithpotentiallyincorrectness\.Yourtaskistoidentifyspanswitherroneous,contradictory,orunsupportedfactsinthesummary,andlabelthemusingthe<error\>tag\(e\.g\.<error\>incorrectfact</error\>\)\.Therecouldbemorethanoneerrorinthesummary\.\{%ifdemonstrations%\}Herearesomeexamples:\{%forexampleindemonstrations%\}DOCUMENT:\{\{example\.context\}\}ORIGINALSUMMARY:\{\{example\.summary\}\}SUMMARYWITHLABELEDERRORS:\{\{example\.summary\_with\_errors\}\}\{%endfor%\}\{%endif%\}Canyouidentifytheerrorsforthefollowingdocumentandsummary?DOCUMENT:\{\{context\}\}ORIGINALSUMMARY:\{\{summary\}\}SUMMARYWITHLABELEDERRORS:<\|end\|\>

Medalign Chain\-of\-thought k\-shot Prompt[⬇](data:text/plain;base64,PHxzeXN0ZW18PgpZb3UgYXJlIGEgaGVscGZ1bCBhc3Npc3RhbnQgdGhhdCBoZWxwcyBwYXRpZW50cyB1bmRlcnN0YW5kIHRoZWlyIG1lZGljYWwgcmVjb3Jkcy4KPHxlbmR8PgoKPHx1c2VyfD4KV2Ugd2lsbCBwcmVzZW50IHlvdSB3aXRoIGEgcGFpciBvZiBhIGJyaWVmIGhvc3BpdGFsIGNvdXJzZSAoQkhDKSBhbmQgYSBwYXRpZW50IGFmdGVyIHZpc2l0IHN1bW1hcnkgKEFWUykuIFRoZSBBVlMgaXMgYWxzbyByZWZlcnJlZCB0byBhcyBkaXNjaGFyZ2Ugc3VtbWFyeS4gVGhlIEJIQyBjb250YWlucyBhIGRldGFpbGVkIHN1bW1hcnkgb2YgdGhlIGhvc3BpdGFsIHN0YXkgd3JpdHRlbiBieSBtZWRpY2FsIHNlcnZpY2UuIEl0IHVzdWFsbHkgY29udGFpbnMgbWVkaWNhbCBqYXJnb24sIGFuZCBpdCBjYW4gZm9sbG93IGRpZmZlcmVudCBzdHJ1Y3R1cmVzIGJhc2VkIG9uIHRoZSBob3NwaXRhbCBjb3Vyc2UgYW5kIHJlc3BvbnNpYmxlIG1lZGljYWwgc3BlY2lhbHR5LiBUaGUgQVZTIHN1bW1hcml6ZXMgdGhlIGhvc3BpdGFsIHN0YXkgZm9yIHRoZSBwYXRpZW50IGluIHBsYWluIGxhbmd1YWdlLiBJbiBwcmFjdGljZSwgdGhlIEJIQyBpcyBub3QgdGhlIG9ubHkgc291cmNlIG9mIGluZm9ybWF0aW9uIHRvIHdyaXRlIHRoZSBBVlMuIEhvd2V2ZXIsIGluIG91ciBzZXR0aW5nIHdlIHRyZWF0IHRoZSBCSEMgYXMgdGhlIG9ubHkgY29udGV4dCBmb3IgdGhlIHN1bW1hcnkuCgpGb3IgdGhpcyBsYWJlbGluZyB0YXNrLCB3ZSBhcmUgaW50ZXJlc3RlZCBpbiBlcnJvcnMgaW4gdGhlIEFWUyB0aGF0IGFyZSBlaXRoZXIgdW5zdXBwb3J0ZWQgYnkgdGhlIEJIQywgY29udHJhZGljdCBjb250ZW50IGluIHRoZSBCSEMsIG9yIGFyZSB3cm9uZyBtZWRpY2FsIGZhY3RzLiBXZSBhbGxvdyBzdGF0ZW1lbnRzIHRoYXQgY29udGFpbiBnZW5lcmFsIG1lZGljYWwga25vd2xlZGdlIG9yIGFkdmljZSB0aGF0IGFyZSBvZnRlbiB1c2VkIGluIHBhdGllbnQgc3VtbWFyaWVzLiBNb3N0IGVycm9ycyBhcmUgZHVlIHRvIHVuc3VwcG9ydGVkIGZhY3RzLCBzbyB3ZSBmdXJ0aGVyIGRpc3Rpbmd1aXNoIHRob3NlIGJhc2VkIG9uIHRoZWlyIHNwZWNpZmljIGNvbnRlbnQuIFRoaXMgbGVhZHMgdG8gdGhlIGZvbGxvd2luZyBlcnJvciB0eXBlcyBvciBsYWJlbHM6CjEuIFVuc3VwcG9ydGVkIGZhY3RzLCBpbmNsdWRpbmcgY29uZGl0aW9uL3Byb2NlZHVyZS9tZWRpY2F0aW9uL3RpbWUvbG9jYXRpb24vbnVtYmVyL25hbWUvd29yZC9vdGhlcgoyLiBDb250cmFkaWN0ZWQgZmFjdAozLiBJbmNvcnJlY3QgZmFjdApBbmQgYmVsb3cgaXMgdGhlIGRldGFpbGVkIGd1aWRlbGluZSwgYW5kIHdlIGxhYmVsIGVycm9yIHNwYW5zIHdpdGggdGhlIDxlcnJvcj4gdGFnIChlLmcuIDxlcnJvcj5pbmNvcnJlY3QgZmFjdDwvZXJyb3I+KS4KCiMjIyBBbGxvd2VkIEdlbmVyYWwgTWVkaWNhbCBLbm93bGVkZ2UgYW5kIE1lZGljYWwgQWR2aWNlCldlIGFsbG93IGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2UgYW5kIGFkdmljZSB0aGF0IGlzIG9mdGVuIHBhcnQgb2YgdGhlIEFWUy4gVXN1YWxseSwgdGhlc2UgYXJlIGluZm9ybWF0aW9uIHRoYXQgYXJlIG5vdCBzcGVjaWZpYyBmb3IgdGhlIGhvc3BpdGFsIGNvdXJzZSBnaXZlbiBpbiB0aGUgQkhDLiBGb3IgZXhhbXBsZQotICJQbGVhc2UgdGFrZSB5b3VyIG1lZGljYXRpb25zIGFzIHByZXNjcmliZWQiIGNvbnRhaW5zIG5vIGVycm9yIGV2ZW4gdGhvdWdoIHRoZSBCSEMgZG9lcyBub3QgY29udGFpbiB0aGlzIGluc3RydWN0aW9uIGJlY2F1c2UgdGhpcyBpcyBnZW5lcmFsIG1lZGljYWwgYWR2aWNlLgotICJJZiB0aGUgc3ltcHRvbXMgZ2V0IHdvcnNlLCBwbGVhc2UgY29udGFjdCB5b3VyIGRvY3RvciIgY29udGFpbnMgbm8gZXJyb3IgZXZlbiB3aGVuIHRoZSBCSEMgZG9lcyBub3QgY29udGFpbiB0aGlzIGZhY3QsIHNpbmNlIGl0IGlzIGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2UgdGhhdCBhIGRvY3RvciBzaG91bGQgYmUgc2VlbiBmb3Igd29yc2VuaW5nIHN5bXB0b21zLiAKCiMjIyBEZXRlcm1pbmluZyBTcGFuIG9mIEVycm9ycwpXZSBsYWJlbCB0aGUgc21hbGxlc3QgcG9zc2libGUgY29uc2VjdXRpdmUgc3BhbiB0aGF0IHNwZWNpZmllcyB0aGUgZXJyb3IgZ2l2ZW4gdGhlIEJIQyBhcyBhIGNvbnRleHQuIFJlbW92aW5nIGZ1cnRoZXIgcGFydHMgZnJvbSB0aGUgc3BhbiB3b3VsZCByZW1vdmUgaW1wb3J0YW50IGluZm9ybWF0aW9uLiBBIHVzZWZ1bCBoZXVyaXN0aWMgaXMgdG8gaWRlbnRpZnkgdGhlIG1pbmltYWwgc3BhbiB0aGF0IG11c3QgYmUgcmVwbGFjZWQgdG8gb2J0YWluIGEgY29ycmVjdCBzdGF0ZW1lbnQgdGhhdCBpcyBncmFtbWF0aWNhbGx5IGNvcnJlY3QuIEZvciBleGFtcGxlCi0gIldlIHBlcmZvcm1lZCBhbiA8ZXJyb3I+ZXNvcGhhZ2VhbC1nYXN0cm8tZHVvZGVub3Njb3B5IChFR0QpLjxlcnJvcj4iIHdoZW4gbm8gc3VjaCBwcm9jZWR1cmUgaXMgcmVwb3J0ZWQgaW4gdGhlIEJIQy4gVGhlIGFydGljbGUgImFuIiBpcyBub3QgbGFiZWxlZCBhcyBhbiBlcnJvci4gV2hlbiBubyBwcm9jZWR1cmUgYXQgYWxsIHdhcyBwZXJmb3JtZWQgInBlcmZvcm1lZCBhbiBlc29waGFnZWFsLWdhc3Ryby1kdW9kZW5vc2NvcHkgKEVHRCkiIHNob3VsZCBiZSBsYWJlbGVkIGFzIGVycm9yIGJlY2F1c2UgdGhlcmUgaXMgbm8gc3VpdGFibGUgc3Vic3RpdHV0ZSBmb3IgImVzb3BoYWdlYWwtZ2FzdHJvLWR1b2Rlbm9zY29weSAoRUdEKSIuCi0gIkFmdGVyIHRoZSBzdXJnZXJ5LCB3ZSA8ZXJyb3I+dHJhbnNpdGlvbmVkIHlvdSB0byBvcmFsIG94eWNvZG9uZTwvZXJyb3I+LiIgd2hlbiB0aGUgQkhDIGNvbnRhaW5zIG5vIGluZm9ybWF0aW9uIGZvciBzdWNoIGEgdHJhbnNpdGlvbi4gSWYgYW5vdGhlciBtZWRpY2F0aW9uIHRyYW5zaXRpb24gaXMgbWVudGlvbmVkIGluIHRoZSBCSEMgYW5kIG1ha2VzIHNlbnNlIGluIHRoaXMgc2VudGVuY2Ugb25seSAib3JhbCBveHljb2RvbmUiIHNob3VsZCBiZSBsYWJlbGVkLiBJZiBhbm90aGVyIG9yYWwgbWVkaWNhdGlvbiB0cmFuc2l0aW9uIGlzIG1lbnRpb25lZCBpbiB0aGUgQkhDIG9ubHkgIm94eWNvZG9uZSIgc2hvdWxkIGJlIGxhYmVsZWQuCi0gIjxlcnJvcj5Zb3VyIHN5bXB0b21zIHJlc3BvbmRlZCB3ZWxsPC9lcnJvcj4uIiB3aGVuIG5vIHBhcnQgb2YgdGhlIHNlbnRlbmNlIG1ha2VzIHNlbnNlIGluIHRoZSBnaXZlbiBjb250ZXh0IG9mIHRoZSBBVlMuCgojIyMgRGVhbGluZyB3aXRoIERlaWRlbnRpZmllZCBJbmZvcm1hdGlvbgpUaGUgZGF0YSBjb250YWlucyBkZWlkZW50aWZpZWQgaW5mb3JtYXRpb24gc2hvd24gd2l0aCAiX19fIiBpbiB0aGUgdGV4dC4gV2UgYWx3YXlzIHRyZWF0IHRoaXMgYXMgbm9uLWV4aXN0ZW50IGluZm9ybWF0aW9uLiBTbywgdGhlIGFubm90YXRvcnMgc2hvdWxkIG5vdCBpbmZlciB3aGF0IHRoZSBkZWlkZW50aWZpZWQgaW5mb3JtYXRpb24gY291bGQgYmUuIEluIGdlbmVyYWwsIGRlaWRlbnRpZmllZCBmaWVsZHMgaW4gdGhlIEFWUyBzaG91bGQgbm90IGJlIGxhYmVsZWQgYXMgZXJyb3JzLiBIb3dldmVyLCBzb21ldGltZXMgdGhleSBiZWxvbmcgdG8gYSB3cm9uZyBzdGF0ZW1lbnQgb3IgY2xlYXJseSBjb250YWluIHVuc3VwcG9ydGVkIGluZm9ybWF0aW9uIChlLmcuLCBhIGRvY3RvcidzIG5hbWUgb3IgcGhvbmUgbnVtYmVycykgdGhhdCBhcmUgbm90IGdpdmVuIGluIHRoZSBCSEMuIEluIHRoZXNlIGNhc2VzLCBkZWlkZW50aWZpZWQgZmllbGRzIHNob3VsZCBiZSBpbmNsdWRlZCBpbiB0aGUgZXJyb3Igc3Bhbi4gRm9yIGV4YW1wbGUKLSAiVGFrZSBfX18gPGVycm9yPjIwMG1nIGRhaWx5PC9lcnJvcj4gYW5kIHRyeSB0byByZXN0IiB3aGVuIG5vIHN1Y2ggZG9zYWdlIGluZm9ybWF0aW9uIGlzIHByb3ZpZGVkIGluIHRoZSBCSEMsIGJ1dCB0aGUgc3RhdGVtZW50IHRvIHJlc3QuIFRoZSBkZWlkZW50aWZpZWQgbWVkaWNhdGlvbiBuYW1lIGlzIGV4Y2x1ZGVkIGZyb20gdGhlIGVycm9yIHNwYW4uCi0gIlBsZWFzZSBhdm9pZCBnb2luZyB1cCA8ZXJyb3I+bW9yZSB0aGFuIF9fXyBzdGFpcnM8L2Vycm9yPiBhdCBhIHRpbWUiIHdoZW4gcmVzdHJpY3Rpb25zIGZvciB0aGUgbnVtYmVyIG9mIHN0YWlycyB0YWtlbiBhdCBhIHRpbWUgYXJlIG5vdGUgbWVudGlvbmVkIGluIHRoZSBCSEMuCi0gIjxlcnJvcj5Eci4gX19fIHdpbGwgZm9sbG93IHVwIHdpdGggeW91PC9lcnJvcj4iIHdoZW4gbm8gZm9sbG93LXVwIGlzIG1lbnRpb25lZCBpbiB0aGUgQkhDLgotICJQbGVhc2Ugc3RvcCB0YWtpbmcgQXNwaXJpbiA8ZXJyb3I+b24gX19fPC9lcnJvcj4iIHdoZW4gbm8gc3RvcHBpbmcgZGF0ZSBpcyBnaXZlbiBpbiB0aGUgQkhDLiAKLSAiWW91ciBSQkMgcGVha2VkIDxlcnJvcj5hdCBfX18gbWlsbGlvbjwvZXJyb3I+IiBpZiB0aGVyZSBpcyBubyBoaW50IG9mIGEgc3BlY2lmaWMgcmVkIGJsb29kIGNlbGwgY291bnQgZ2l2ZW4gaW4gdGhlIEJIQy4KCiMjIyBPbmUgRXJyb3IgcGVyIFNwYW4KVG8gZ2V0IHJlbGlhYmxlIGVycm9yIGNvdW50cyBhIHNwYW4gc2hvdWxkIG9ubHkgY29udGFpbiBhIHNpbmdsZSBlcnJvci4KLSAiWW91IHJlY2VpdmVkIDxlcnJvcj5UeWxlbm9sPC9lcnJvcj4gYW5kIDxlcnJvcj5DaXByb2Zsb3hhY2luPC9lcnJvcj4iIHdoZW4gdGhlcmUgaXMgbm8gZXZpZGVuY2UgaW4gdGhlIEJIQyB0aGF0IHRoZSB0d28gbWVkaWNhdGlvbnMgd2VyZSBhZG1pbmlzdGVyZWQgdG8gdGhlIHBhdGllbnQuCi0gIllvdSBoYXZlIGEgPGVycm9yPmZvbGxvdy11cCBhcHBvaW50bWVudCB3aXRoIHlvdXIgUENQPC9lcnJvcj4gYW5kIDxlcnJvcj55b3VyIGNhcmRpb2xvZ2lzdDwvZXJyb3I+IiB3aGVuIG5vIHN1Y2ggZm9sbG93IHVwIGlzIG1lbnRpb25lZCBpbiB0aGUgQkhDLiBCb3RoIGVycm9ycyBhcmUgbGFiZWxlZCBzZXBhcmF0ZWx5LgoKeyUgZm9yIGV4YW1wbGUgaW4gZGVtb25zdHJhdGlvbnMgJX0KIyMjIEV4YW1wbGUge3sgbG9vcC5pbmRleCB9fQoKQkhDOgp7eyBleGFtcGxlLmNvbnRleHQgfX0KCkFWUzoKe3sgZXhhbXBsZS5zdW1tYXJ5IH19CnslIGlmIGV4YW1wbGUuZXJyb3JfZGVzY3JpcHRpb25zICV9CgpFUlJPUlM6Cnt7IGV4YW1wbGUuZXJyb3JfZGVzY3JpcHRpb25zIH19Cgp7JSBlbmRpZiAlfQpBVlMgV0lUSCBFUlJPUlMgTEFCRUxFRDoKe3sgZXhhbXBsZS5zdW1tYXJ5X3dpdGhfZXJyb3JzIH19Cgp7JSBlbmRmb3IgJX0KeyUgaWYgZGVtb25zdHJhdGlvbnMgJX0KCiMjIyBFeGFtcGxlIHt7IGxlbihkZW1vbnN0cmF0aW9ucykgKyAxIH19CnslIGVuZGlmICV9CgpCSEM6Cnt7IGNvbnRleHQgfX0KCkFWUzoKe3sgc3VtbWFyeSB9fQoKRVJST1I6Cjx8ZW5kfD4=)<\|system\|\>Youareahelpfulassistantthathelpspatientsunderstandtheirmedicalrecords\.<\|end\|\><\|user\|\>Wewillpresentyouwithapairofabriefhospitalcourse\(BHC\)andapatientaftervisitsummary\(AVS\)\.TheAVSisalsoreferredtoasdischargesummary\.TheBHCcontainsadetailedsummaryofthehospitalstaywrittenbymedicalservice\.Itusuallycontainsmedicaljargon,anditcanfollowdifferentstructuresbasedonthehospitalcourseandresponsiblemedicalspecialty\.TheAVSsummarizesthehospitalstayforthepatientinplainlanguage\.Inpractice,theBHCisnottheonlysourceofinformationtowritetheAVS\.However,inoursettingwetreattheBHCastheonlycontextforthesummary\.Forthislabelingtask,weareinterestedinerrorsintheAVSthatareeitherunsupportedbytheBHC,contradictcontentintheBHC,orarewrongmedicalfacts\.Weallowstatementsthatcontaingeneralmedicalknowledgeoradvicethatareoftenusedinpatientsummaries\.Mosterrorsareduetounsupportedfacts,sowefurtherdistinguishthosebasedontheirspecificcontent\.Thisleadstothefollowingerrortypesorlabels:1\.Unsupportedfacts,includingcondition/procedure/medication/time/location/number/name/word/other2\.Contradictedfact3\.IncorrectfactAndbelowisthedetailedguideline,andwelabelerrorspanswiththe<error\>tag\(e\.g\.<error\>incorrectfact</error\>\)\.\#\#\#AllowedGeneralMedicalKnowledgeandMedicalAdviceWeallowgeneralmedicalknowledgeandadvicethatisoftenpartoftheAVS\.Usually,theseareinformationthatarenotspecificforthehospitalcoursegivenintheBHC\.Forexample\-"Pleasetakeyourmedicationsasprescribed"containsnoerroreventhoughtheBHCdoesnotcontainthisinstructionbecausethisisgeneralmedicaladvice\.\-"Ifthesymptomsgetworse,pleasecontactyourdoctor"containsnoerrorevenwhentheBHCdoesnotcontainthisfact,sinceitisgeneralmedicalknowledgethatadoctorshouldbeseenforworseningsymptoms\.\#\#\#DeterminingSpanofErrorsWelabelthesmallestpossibleconsecutivespanthatspecifiestheerrorgiventheBHCasacontext\.Removingfurtherpartsfromthespanwouldremoveimportantinformation\.Ausefulheuristicistoidentifytheminimalspanthatmustbereplacedtoobtainacorrectstatementthatisgrammaticallycorrect\.Forexample\-"Weperformedan<error\>esophageal\-gastro\-duodenoscopy\(EGD\)\.<error\>"whennosuchprocedureisreportedintheBHC\.Thearticle"an"isnotlabeledasanerror\.Whennoprocedureatallwasperformed"performedanesophageal\-gastro\-duodenoscopy\(EGD\)"shouldbelabeledaserrorbecausethereisnosuitablesubstitutefor"esophageal\-gastro\-duodenoscopy\(EGD\)"\.\-"Afterthesurgery,we<error\>transitionedyoutooraloxycodone</error\>\."whentheBHCcontainsnoinformationforsuchatransition\.IfanothermedicationtransitionismentionedintheBHCandmakessenseinthissentenceonly"oraloxycodone"shouldbelabeled\.IfanotheroralmedicationtransitionismentionedintheBHConly"oxycodone"shouldbelabeled\.\-"<error\>Yoursymptomsrespondedwell</error\>\."whennopartofthesentencemakessenseinthegivencontextoftheAVS\.\#\#\#DealingwithDeidentifiedInformationThedatacontainsdeidentifiedinformationshownwith"\_\_\_"inthetext\.Wealwaystreatthisasnon\-existentinformation\.So,theannotatorsshouldnotinferwhatthedeidentifiedinformationcouldbe\.Ingeneral,deidentifiedfieldsintheAVSshouldnotbelabeledaserrors\.However,sometimestheybelongtoawrongstatementorclearlycontainunsupportedinformation\(e\.g\.,adoctor’snameorphonenumbers\)thatarenotgivenintheBHC\.Inthesecases,deidentifiedfieldsshouldbeincludedintheerrorspan\.Forexample\-"Take\_\_\_<error\>200mgdaily</error\>andtrytorest"whennosuchdosageinformationisprovidedintheBHC,butthestatementtorest\.Thedeidentifiedmedicationnameisexcludedfromtheerrorspan\.\-"Pleaseavoidgoingup<error\>morethan\_\_\_stairs</error\>atatime"whenrestrictionsforthenumberofstairstakenatatimearenotementionedintheBHC\.\-"<error\>Dr\.\_\_\_willfollowupwithyou</error\>"whennofollow\-upismentionedintheBHC\.\-"PleasestoptakingAspirin<error\>on\_\_\_</error\>"whennostoppingdateisgivenintheBHC\.\-"YourRBCpeaked<error\>at\_\_\_million</error\>"ifthereisnohintofaspecificredbloodcellcountgivenintheBHC\.\#\#\#OneErrorperSpanTogetreliableerrorcountsaspanshouldonlycontainasingleerror\.\-"Youreceived<error\>Tylenol</error\>and<error\>Ciprofloxacin</error\>"whenthereisnoevidenceintheBHCthatthetwomedicationswereadministeredtothepatient\.\-"Youhavea<error\>follow\-upappointmentwithyourPCP</error\>and<error\>yourcardiologist</error\>"whennosuchfollowupismentionedintheBHC\.Botherrorsarelabeledseparately\.\{%forexampleindemonstrations%\}\#\#\#Example\{\{loop\.index\}\}BHC:\{\{example\.context\}\}AVS:\{\{example\.summary\}\}\{%ifexample\.error\_descriptions%\}ERRORS:\{\{example\.error\_descriptions\}\}\{%endif%\}AVSWITHERRORSLABELED:\{\{example\.summary\_with\_errors\}\}\{%endfor%\}\{%ifdemonstrations%\}\#\#\#Example\{\{len\(demonstrations\)\+1\}\}\{%endif%\}BHC:\{\{context\}\}AVS:\{\{summary\}\}ERROR:<\|end\|\>

Medalign Chain\-of\-thought K\-Shot with Class Explanation Prompt[⬇](data:text/plain;base64,PHxzeXN0ZW18PgpZb3UgYXJlIGEgaGVscGZ1bCBhc3Npc3RhbnQgdGhhdCBoZWxwcyBwYXRpZW50cyB1bmRlcnN0YW5kIHRoZWlyIG1lZGljYWwgcmVjb3Jkcy4KPHxlbmR8PgoKPHx1c2VyfD4KV2Ugd2lsbCBwcmVzZW50IHlvdSB3aXRoIGEgcGFpciBvZiBhIGJyaWVmIGhvc3BpdGFsIGNvdXJzZSAoQkhDKSBhbmQgYSBwYXRpZW50IGFmdGVyIHZpc2l0IHN1bW1hcnkgKEFWUykuIFRoZSBBVlMgaXMgYWxzbyByZWZlcnJlZCB0byBhcyBkaXNjaGFyZ2Ugc3VtbWFyeS4gVGhlIEJIQyBjb250YWlucyBhIGRldGFpbGVkIHN1bW1hcnkgb2YgdGhlIGhvc3BpdGFsIHN0YXkgd3JpdHRlbiBieSBtZWRpY2FsIHNlcnZpY2UuIEl0IHVzdWFsbHkgY29udGFpbnMgbWVkaWNhbCBqYXJnb24sIGFuZCBpdCBjYW4gZm9sbG93IGRpZmZlcmVudCBzdHJ1Y3R1cmVzIGJhc2VkIG9uIHRoZSBob3NwaXRhbCBjb3Vyc2UgYW5kIHJlc3BvbnNpYmxlIG1lZGljYWwgc3BlY2lhbHR5LiBUaGUgQVZTIHN1bW1hcml6ZXMgdGhlIGhvc3BpdGFsIHN0YXkgZm9yIHRoZSBwYXRpZW50IGluIHBsYWluIGxhbmd1YWdlLiBJbiBwcmFjdGljZSwgdGhlIEJIQyBpcyBub3QgdGhlIG9ubHkgc291cmNlIG9mIGluZm9ybWF0aW9uIHRvIHdyaXRlIHRoZSBBVlMuIEhvd2V2ZXIsIGluIG91ciBzZXR0aW5nIHdlIHRyZWF0IHRoZSBCSEMgYXMgdGhlIG9ubHkgY29udGV4dCBmb3IgdGhlIHN1bW1hcnkuCgojIyBJbnN0cnVjdGlvbnMKCkZvciB0aGlzIGxhYmVsaW5nIHRhc2ssIHdlIGFyZSBpbnRlcmVzdGVkIGluIGVycm9ycyBpbiB0aGUgQVZTIHRoYXQgYXJlIGVpdGhlciB1bnN1cHBvcnRlZCBieSB0aGUgQkhDLCBjb250cmFkaWN0IGNvbnRlbnQgaW4gdGhlIEJIQywgb3IgYXJlIHdyb25nIG1lZGljYWwgZmFjdHMuIFdlIGFsbG93IHN0YXRlbWVudHMgdGhhdCBjb250YWluIGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2Ugb3IgYWR2aWNlIHRoYXQgYXJlIG9mdGVuIHVzZWQgaW4gcGF0aWVudCBzdW1tYXJpZXMuIE1vc3QgZXJyb3JzIGFyZSBkdWUgdG8gdW5zdXBwb3J0ZWQgZmFjdHMsIHNvIHdlIGZ1cnRoZXIgZGlzdGluZ3Vpc2ggdGhvc2UgYmFzZWQgb24gdGhlaXIgc3BlY2lmaWMgY29udGVudC4gVGhpcyBsZWFkcyB0byB0aGUgZm9sbG93aW5nIGVycm9yIHR5cGVzIG9yIGxhYmVsczoKMS4gVW5zdXBwb3J0ZWQgZmFjdHMsIGluY2x1ZGluZyBjb25kaXRpb24vcHJvY2VkdXJlL21lZGljYXRpb24vdGltZS9sb2NhdGlvbi9udW1iZXIvbmFtZS93b3JkL290aGVyCjIuIENvbnRyYWRpY3RlZCBmYWN0CjMuIEluY29ycmVjdCBmYWN0CkFuZCBiZWxvdyBpcyB0aGUgZGV0YWlsZWQgZ3VpZGVsaW5lLCBhbmQgd2UgbGFiZWwgZXJyb3Igc3BhbnMgd2l0aCB0aGUgPGVycm9yPiB0YWcgKGUuZy4gPGVycm9yPmluY29ycmVjdCBmYWN0PC9lcnJvcj4pLgoKIyMjIERldGVybWluaW5nIFNwYW4gb2YgRXJyb3JzCldlIGxhYmVsIHRoZSBzbWFsbGVzdCBwb3NzaWJsZSBjb25zZWN1dGl2ZSBzcGFuIHRoYXQgc3BlY2lmaWVzIHRoZSBlcnJvciBnaXZlbiB0aGUgQkhDIGFzIGEgY29udGV4dC4gUmVtb3ZpbmcgZnVydGhlciBwYXJ0cyBmcm9tIHRoZSBzcGFuIHdvdWxkIHJlbW92ZSBpbXBvcnRhbnQgaW5mb3JtYXRpb24uIEEgdXNlZnVsIGhldXJpc3RpYyBpcyB0byBpZGVudGlmeSB0aGUgbWluaW1hbCBzcGFuIHRoYXQgbXVzdCBiZSByZXBsYWNlZCB0byBvYnRhaW4gYSBjb3JyZWN0IHN0YXRlbWVudCB0aGF0IGlzIGdyYW1tYXRpY2FsbHkgY29ycmVjdC4gRm9yIGV4YW1wbGUKLSAiV2UgcGVyZm9ybWVkIGFuIDxlcnJvcj5lc29waGFnZWFsLWdhc3Ryby1kdW9kZW5vc2NvcHkgKEVHRCkuPGVycm9yPiIgd2hlbiBubyBzdWNoIHByb2NlZHVyZSBpcyByZXBvcnRlZCBpbiB0aGUgQkhDLiBUaGUgYXJ0aWNsZSAiYW4iIGlzIG5vdCBsYWJlbGVkIGFzIGFuIGVycm9yLiBXaGVuIG5vIHByb2NlZHVyZSBhdCBhbGwgd2FzIHBlcmZvcm1lZCAicGVyZm9ybWVkIGFuIGVzb3BoYWdlYWwtZ2FzdHJvLWR1b2Rlbm9zY29weSAoRUdEKSIgc2hvdWxkIGJlIGxhYmVsZWQgYXMgZXJyb3IgYmVjYXVzZSB0aGVyZSBpcyBubyBzdWl0YWJsZSBzdWJzdGl0dXRlIGZvciAiZXNvcGhhZ2VhbC1nYXN0cm8tZHVvZGVub3Njb3B5IChFR0QpIi4KLSAiQWZ0ZXIgdGhlIHN1cmdlcnksIHdlIDxlcnJvcj50cmFuc2l0aW9uZWQgeW91IHRvIG9yYWwgb3h5Y29kb25lPC9lcnJvcj4uIiB3aGVuIHRoZSBCSEMgY29udGFpbnMgbm8gaW5mb3JtYXRpb24gZm9yIHN1Y2ggYSB0cmFuc2l0aW9uLiBJZiBhbm90aGVyIG1lZGljYXRpb24gdHJhbnNpdGlvbiBpcyBtZW50aW9uZWQgaW4gdGhlIEJIQyBhbmQgbWFrZXMgc2Vuc2UgaW4gdGhpcyBzZW50ZW5jZSBvbmx5ICJvcmFsIG94eWNvZG9uZSIgc2hvdWxkIGJlIGxhYmVsZWQuIElmIGFub3RoZXIgb3JhbCBtZWRpY2F0aW9uIHRyYW5zaXRpb24gaXMgbWVudGlvbmVkIGluIHRoZSBCSEMgb25seSAib3h5Y29kb25lIiBzaG91bGQgYmUgbGFiZWxlZC4KLSAiPGVycm9yPllvdXIgc3ltcHRvbXMgcmVzcG9uZGVkIHdlbGw8L2Vycm9yPi4iIHdoZW4gbm8gcGFydCBvZiB0aGUgc2VudGVuY2UgbWFrZXMgc2Vuc2UgaW4gdGhlIGdpdmVuIGNvbnRleHQgb2YgdGhlIEFWUy4KCldlIGFsbG93IGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2UgYW5kIGFkdmljZSB0aGF0IGlzIG9mdGVuIHBhcnQgb2YgdGhlIEFWUy4gVXN1YWxseSwgdGhlc2UgYXJlIGluZm9ybWF0aW9uIHRoYXQgYXJlIG5vdCBzcGVjaWZpYyBmb3IgdGhlIGhvc3BpdGFsIGNvdXJzZSBnaXZlbiBpbiB0aGUgQkhDLiBGb3IgZXhhbXBsZQotICJQbGVhc2UgdGFrZSB5b3VyIG1lZGljYXRpb25zIGFzIHByZXNjcmliZWQiIGNvbnRhaW5zIG5vIGVycm9yIGV2ZW4gdGhvdWdoIHRoZSBCSEMgZG9lcyBub3QgY29udGFpbiB0aGlzIGluc3RydWN0aW9uIGJlY2F1c2UgdGhpcyBpcyBnZW5lcmFsIG1lZGljYWwgYWR2aWNlLgotICJJZiB0aGUgc3ltcHRvbXMgZ2V0IHdvcnNlLCBwbGVhc2UgY29udGFjdCB5b3VyIGRvY3RvciIgY29udGFpbnMgbm8gZXJyb3IgZXZlbiB3aGVuIHRoZSBCSEMgZG9lcyBub3QgY29udGFpbiB0aGlzIGZhY3QsIHNpbmNlIGl0IGlzIGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2UgdGhhdCBhIGRvY3RvciBzaG91bGQgYmUgc2VlbiBmb3Igd29yc2VuaW5nIHN5bXB0b21zLiAKCldlIHRyeSB0byBpZ25vcmUgZ3JhbW1hdGljYWwgZXJyb3JzIGluIHRoZSBCSEMgYW5kIEFWUy4gSWYgdGhlIG9yaWdpbmFsIG1lYW5pbmcgY2FuIHN0aWxsIGJlIGluZmVycmVkIChlLmcuICJtZWRpY3RhaW9ucyIgaW5zdGVhZCBvZiAibWVkaWNhdGlvbnMiKSwgdGhlIG1vc3QgbGlrZWx5IGNvcnJlY3RlZCBmb3JtIGNhbiBiZSB1c2VkLiBJZiB0aGUgbWVhbmluZyBjYW5ub3QgYmUgaW5mZXJyZWQsIHRoZXkgY2FuIGJlIGlnbm9yZWQgaW4gdGhlIEJIQyBvciBsYWJlbGVkIGFzIFVuc3VwcG9ydGVkIE90aGVyIGluIHRoZSBBVlMuCgpJZiBhIHNlbnRlbmNlIG9yIHBocmFzZSBpcyByZXBlYXRlZCwgdGhlbiBwbGVhc2UgdHJlYXQgaXQgYXMgeW91IHdvdWxkIGFueSBvdGhlciBzZW50ZW5jZSBhbmQgaGlnaGxpZ2h0IGFsbCBlcnJvcnMgKGV2ZW4gaWYgeW91IGRpZCBzbyBpbiBhIHByZXZpb3VzIHNlbnRlbmNlKS4gRm9yIGV4YW1wbGUKLSAiUGxlYXNlIHRha2UgVHlsZW5vbC4gUGxlYXNlIHRha2UgVHlsZW5vbCIgd2hlbiBUeWxlbm9sIHdhcyBwcmVzY3JpYmVkIGluIHRoZSBCSEMuCi0gIkxpbWl0IHlvdXIgPGVycm9yPnVzZSBvZiBzdGFpcnM8L2Vycm9yPi4gUGxlYXNlIGxpbWl0IDxlcnJvcj51c2Ugb2Ygc3RhaXJzPC9lcnJvcj4iIHdoZW4gbW92ZW1lbnQgd2FzIGVuY291cmFnZWQuCgpUbyBnZXQgcmVsaWFibGUgZXJyb3IgY291bnRzIGEgc3BhbiBzaG91bGQgb25seSBjb250YWluIGEgc2luZ2xlIGVycm9yLgotICJZb3UgcmVjZWl2ZWQgPGVycm9yPlR5bGVub2w8L2Vycm9yPiBhbmQgPGVycm9yPkNpcHJvZmxveGFjaW48L2Vycm9yPiIgd2hlbiB0aGVyZSBpcyBubyBldmlkZW5jZSBpbiB0aGUgQkhDIHRoYXQgdGhlIHR3byBtZWRpY2F0aW9ucyB3ZXJlIGFkbWluaXN0ZXJlZCB0byB0aGUgcGF0aWVudC4KLSAiWW91IGhhdmUgYSA8ZXJyb3I+Zm9sbG93LXVwIGFwcG9pbnRtZW50IHdpdGggeW91ciBQQ1A8L2Vycm9yPiBhbmQgPGVycm9yPnlvdXIgY2FyZGlvbG9naXN0PC9lcnJvcj4iIHdoZW4gbm8gc3VjaCBmb2xsb3cgdXAgaXMgbWVudGlvbmVkIGluIHRoZSBCSEMuIEJvdGggZXJyb3JzIGFyZSBsYWJlbGVkIHNlcGFyYXRlbHkuCgojIyMgRGVhbGluZyB3aXRoIERlaWRlbnRpZmllZCBJbmZvcm1hdGlvbgpUaGUgZGF0YSBjb250YWlucyBkZWlkZW50aWZpZWQgaW5mb3JtYXRpb24gc2hvd24gd2l0aCAiX19fIiBpbiB0aGUgdGV4dC4gV2UgYWx3YXlzIHRyZWF0IHRoaXMgYXMgbm9uLWV4aXN0ZW50IGluZm9ybWF0aW9uLiBTbywgdGhlIGFubm90YXRvcnMgc2hvdWxkIG5vdCBpbmZlciB3aGF0IHRoZSBkZWlkZW50aWZpZWQgaW5mb3JtYXRpb24gY291bGQgYmUuIEluIGdlbmVyYWwsIGRlaWRlbnRpZmllZCBmaWVsZHMgaW4gdGhlIEFWUyBzaG91bGQgbm90IGJlIGxhYmVsZWQgYXMgZXJyb3JzLiBIb3dldmVyLCBzb21ldGltZXMgdGhleSBiZWxvbmcgdG8gYSB3cm9uZyBzdGF0ZW1lbnQgb3IgY2xlYXJseSBjb250YWluIHVuc3VwcG9ydGVkIGluZm9ybWF0aW9uIChlLmcuLCBhIGRvY3RvcidzIG5hbWUgb3IgcGhvbmUgbnVtYmVycykgdGhhdCBhcmUgbm90IGdpdmVuIGluIHRoZSBCSEMuIEluIHRoZXNlIGNhc2VzLCBkZWlkZW50aWZpZWQgZmllbGRzIHNob3VsZCBiZSBpbmNsdWRlZCBpbiB0aGUgZXJyb3Igc3Bhbi4gRm9yIGV4YW1wbGUKLSAiVGFrZSBfX18gPGVycm9yPjIwMG1nIGRhaWx5PC9lcnJvcj4gYW5kIHRyeSB0byByZXN0IiB3aGVuIG5vIHN1Y2ggZG9zYWdlIGluZm9ybWF0aW9uIGlzIHByb3ZpZGVkIGluIHRoZSBCSEMsIGJ1dCB0aGUgc3RhdGVtZW50IHRvIHJlc3QuIFRoZSBkZWlkZW50aWZpZWQgbWVkaWNhdGlvbiBuYW1lIGlzIGV4Y2x1ZGVkIGZyb20gdGhlIGVycm9yIHNwYW4uCi0gIlBsZWFzZSBhdm9pZCBnb2luZyB1cCA8ZXJyb3I+bW9yZSB0aGFuIF9fXyBzdGFpcnM8L2Vycm9yPiBhdCBhIHRpbWUiIHdoZW4gcmVzdHJpY3Rpb25zIGZvciB0aGUgbnVtYmVyIG9mIHN0YWlycyB0YWtlbiBhdCBhIHRpbWUgYXJlIG5vdGUgbWVudGlvbmVkIGluIHRoZSBCSEMuCi0gIjxlcnJvcj5Eci4gX19fIHdpbGwgZm9sbG93IHVwIHdpdGggeW91PC9lcnJvcj4iIHdoZW4gbm8gZm9sbG93LXVwIGlzIG1lbnRpb25lZCBpbiB0aGUgQkhDLgotICJQbGVhc2Ugc3RvcCB0YWtpbmcgQXNwaXJpbiA8ZXJyb3I+b24gX19fPC9lcnJvcj4iIHdoZW4gbm8gc3RvcHBpbmcgZGF0ZSBpcyBnaXZlbiBpbiB0aGUgQkhDLiAKLSAiWW91ciBSQkMgcGVha2VkIDxlcnJvcj5hdCBfX18gbWlsbGlvbjwvZXJyb3I+IiBpZiB0aGVyZSBpcyBubyBoaW50IG9mIGEgc3BlY2lmaWMgcmVkIGJsb29kIGNlbGwgY291bnQgZ2l2ZW4gaW4gdGhlIEJIQy4KCiMjIyBFcnJvciBUeXBlcwpJbiBnZW5lcmFsLCB3ZSBhc2sgZm9yIHRoZSBtb3N0IHNwZWNpZmljIGVycm9yIHRoYXQgaXMgYXBwbGljYWJsZS4gSWYgdGhlcmUgaXMgdW5jZXJ0YWludHkgd2hpY2ggdHlwZSBhcHBsaWVzLCBwcmVmZXIgdGhlIG9uZSBtZW50aW9uZWQgZmlyc3QgaW4gdGhlIGVudW1lcmF0aW9uIG9mIGFsbCBlcnJvciB0eXBlcyBzaG93biBlYXJsaWVyLiBGb3IgaW5zdGFuY2UsIGlmIHRoZSBlcnJvciBjb250YWlucyBhbiB1bnN1cHBvcnRlZCBtZWRpY2F0aW9uIG5hbWUsIHRoZSBVbnN1cHBvcnRlZCBtZWRpY2F0aW9uIHR5cGUgc2hvdWxkIGJlIHVzZWQgaW5zdGVhZCBvZiB0aGUgVW5zdXBwb3J0ZWQgbmFtZSB0eXBlLiBIZXJlIGlzIGEgZGV0YWlsZWQgZGVzY3JpcHRpb24gb2YgdGhlIGVycm9yIHR5cGVzOgotIGBVbnN1cHBvcnRlZCBDb25kaXRpb25gOiBpbmNsdWRlcyB1bnN1cHBvcnRlZCBzeW1wdG9tcywgZGlzZWFzZXMsIG9yIGZpbmRpbmdzIG9mIHRoZSBwYXRpZW50LiBGb3IgZXhhbXBsZQogICAgLSAiWW91IHdlcmUgZm91bmQgdG8gaGF2ZSBhIDxlcnJvcj5sZWZ0IGNsYXZpY2xlIGZyYWN0dXJlPC9lcnJvcj4iIHdoZW4gbm8gaW5mb3JtYXRpb24gd2FzIGdpdmVuIGZvciB0aGlzIGNvbmRpdGlvbiBpbiB0aGUgQkhDLgotIGBVbnN1cHBvcnRlZCBQcm9jZWR1cmVgOiBpbmNsdWRlcyBhbnkgdW5zdXBwb3J0ZWQgbWVkaWNhbCBwcm9jZWR1cmVzLiBGb3IgZXhhbXBsZQogICAgLSAiWW91IGhhZCBhIDxlcnJvcj5maWx0ZXIgcGxhY2VkIGluIHlvdXIgdmVpbjwvZXJyb3I+IiB3aGVuIG5vIGludGVydmVudGlvbiB3aXRoIGEgZmlsdGVyIHdhcyBtZW50aW9uZWQuCi0gYFVuc3VwcG9ydGVkIE1lZGljYXRpb25gOiBjb250YWlucyBhbGwgZXJyb3JzIHJlbGF0ZWQgdG8gdW5zdXBwb3J0ZWQgbWVkaWNhdGlvbnMuIFRoaXMgaW5jbHVkZXMgbWVkaWNhdGlvbiBjbGFzc2VzLCBzdWJzdGFuY2VzLCByb3V0ZXMsIGZyZXF1ZW5jaWVzLCBhbmQgZG9zYWdlcy4gRm9yIGV4YW1wbGUKICAgIC0gIllvdSB3ZXJlIHBsYWNlZCBvbiA8ZXJyb3I+YW50aWJpb3RpY3M8L2Vycm9yPiIgd2hlbiBvbmx5IGJsb29kIHRoaW5uZXJzIHdlcmUgcHJlc2NyaWJlZC4KLSBgVW5zdXBwb3J0ZWQgVGltZWA6IGluY2x1ZGVzIGFsbCBlcnJvcnMgZm9yIHVuc3VwcG9ydGVkIHRpbWUgb3IgaW50ZXJ2YWwgc3RhdGVtZW50cy4gRm9yIGV4YW1wbGUKICAgIC0gIktlZXAgeW91ciBhcm0gaW4gYSBzbGluZyBmb3IgdGhlIDxlcnJvcj5uZXh0IDYgd2Vla3M8L2Vycm9yPiIgd2hlbiBubyBzcGVjaWZpYyBkdXJhdGlvbiBpcyBnaXZlbi4KLSBgVW5zdXBwb3J0ZWQgTG9jYXRpb25gOiBMb2NhdGlvbnMgaW5jbHVkZSBib3RoIHVuc3VwcG9ydGVkIHBoeXNpY2FsIHBsYWNlcyBhcyB3ZWxsIGFzIHJlZ2lvbnMgb2YgdGhlIHBhdGllbnQuIEZvciBleGFtcGxlCiAgICAtICJUaGUgcGF0aWVudCB3YXMgYWRtaXR0ZWQgdG8gdGhlIDxlcnJvcj5BY3V0ZSBTdXJnZXJ5IFNlcnZpY2U8L2Vycm9yPiIgd2hlbiBubyBhZG1pc3Npb24gbG9jYXRpb24gd2FzIHByb3ZpZGVkIGluIHRoZSBCSEMuCi0gYFVuc3VwcG9ydGVkIE51bWJlcmA6IGFueSBudW1iZXIgZWl0aGVyIGFzIGRpZ2l0cyBvciB3cml0dGVuIHRoYXQgYXJlIHVuc3VwcG9ydGVkLiBUaGlzIGFsc28gaW5jbHVkZXMgd29yZHMgc3VjaCBhcyAiYSIgYW5kICJhbiIuIEZvciBleGFtcGxlCiAgICAtICJZb3VyIHBhY2VtYWtlciByYXRlIHdhcyBpbmNyZWFzZWQgdG8gPGVycm9yPjUwPC9lcnJvcj4iIHdoZW4gdGhlIHJhdGUgb2YgNTAgaXMgbm90IGdpdmVuIGluIHRoZSBCSEMuCi0gYFVuc3VwcG9ydGVkIE5hbWVgOiBuYW1lZCBlbnRpdGllcyB0aGF0IGFyZSBub3Qgc3VwcG9ydGVkIGJ5IHRoZSBCSEMuIEZvciBleGFtcGxlCiAgICAtICJZb3Ugd2VyZSBzZWVuIGJ5IHRoZSA8ZXJyb3I+aW50ZXJ2ZW50aW9uYWwgcHVsbW9uYXJ5IHNlcnZpY2U8L2Vycm9yPiIgd2hlbiBubyBjb25zdWx0IHdpdGggdGhpcyBzZXJ2aWNlIHdhcyBtZW50aW9uZWQgaW4gdGhlIEJIQy4KLSBgVW5zdXBwb3J0ZWQgV29yZGA6IGluY29ycmVjdCBvciBpbmFwcHJvcHJpYXRlIHdvcmRzIG9yIHBocmFzZXMgd2hpY2ggZG8gbm90IGZpdCBpbiBhbnkgb2YgdGhlIGFib3ZlIHR5cGVzLiBGb3IgZXhhbXBsZQogICAgLSAiV2Ugd2lsbCBzZW5kIHlvdSBob21lIHdpdGggYSA8ZXJyb3I+ZHJhaW48L2Vycm9yPiBpbiBwbGFjZSIgd2hlbiBkcmFpbiBub3QgbWVudGlvbmVkIGluIHRoZSBCSEMuCi0gYFVuc3VwcG9ydGVkIE90aGVyYDogSWYgdGhlcmUgaXMgYSBtaXN0YWtlIHdoaWNoIGNsZWFybHkgZG9lcyBub3QgYmVsb25nIHRvIGFueSBvZiB0aGUgYWJvdmUgY2F0ZWdvcmllcywgeW91IG1heSB1c2UgdGhpcyBjYXRlZ29yeSBhcyBhIGxhc3QgcmVzb3J0LiBXZSBjYW5ub3QgZ2l2ZSBwcmVjaXNlIGluc3RydWN0aW9ucyBiZWNhdXNlIHRoZSAib3RoZXIiIGNhdGVnb3J5IGlzIHZlcnkgYnJvYWQuCi0gYENvbnRyYWRpY3RlZCBGYWN0YDogVGhpcyBlcnJvciB0eXBlIGlzIGluZGVwZW5kZW50IG9mIHRoZSBjb250ZW50IGFuZCBjb250YWlucyBhbGwgZmFjdHMgdGhhdCBjbGVhcmx5IGNvbnRyYWRpY3QgaW5mb3JtYXRpb24gcHJvdmlkZWQgaW4gdGhlIEJIQy4gRm9yIGV4YW1wbGUKICAgIC0gIllvdXIgcGFjZW1ha2VyIHJhdGUgd2FzIGluY3JlYXNlZCB0byA8ZXJyb3I+NTA8L2Vycm9yPiIgd2hlbiB0aGUgY29udGV4dCBzdGF0ZSBhIHBhY2VtYWtlciByYXRlIG9mIDQwLgotIGBJbmNvcnJlY3QgRmFjdGA6IFRoaXMgZXJyb3IgdHlwZSBpcyBpbmRlcGVuZGVudCBvZiB0aGUgY29udGVudCBhbmQgY29udGFpbnMgYWxsIGZhY3RzIHRoYXQgY2xlYXJseSBjb250cmFkaWN0IGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2Ugb3IgYWR2aWNlLiBGb3IgZXhhbXBsZQogICAgLSAiV2UgZGlhZ25vc2VkIGEgc2VpenVyZSwgYW5kIHlvdSA8ZXJyb3I+Y2FuIGNvbnRpbnVlIGRyaXZpbmcgeW91ciBjYXI8L2Vycm9yPiIgd2hlbiBubyByZWFzb24gZm9yIGFsbG93aW5nIGRyaXZpbmcgYWZ0ZXIgYSBzZWl6dXJlIGlzIHByb3ZpZGVkIHRoaXMgY29udHJhZGljdCBjb21tb24gbWVkaWNhbCBrbm93bGVkZ2UuCgp7JSBmb3IgZXhhbXBsZSBpbiBkZW1vbnN0cmF0aW9ucyAlfQojIyBFeGFtcGxlcwojIyMgRXhhbXBsZSB7eyBsb29wLmluZGV4IH19CgpCSEM6Cnt7IGV4YW1wbGUuY29udGV4dCB9fQoKQVZTOgp7eyBleGFtcGxlLnN1bW1hcnkgfX0KeyUgaWYgZXhhbXBsZS5lcnJvcl9kZXNjcmlwdGlvbnMgJX0KCkVSUk9SUzoKe3sgZXhhbXBsZS5lcnJvcl9kZXNjcmlwdGlvbnMgfX0KCnslIGVuZGlmICV9CkFWUyBXSVRIIEVSUk9SUyBMQUJFTEVEOgp7eyBleGFtcGxlLnN1bW1hcnlfd2l0aF9lcnJvcnMgfX0KCnslIGVuZGZvciAlfQp7JSBpZiBkZW1vbnN0cmF0aW9ucyAlfQoKIyMjIEV4YW1wbGUge3sgbGVuKGRlbW9uc3RyYXRpb25zKSArIDEgfX0KeyUgZW5kaWYgJX0KQkhDOgp7eyBjb250ZXh0IH19CgpBVlM6Cnt7IHN1bW1hcnkgfX0KPHxlbmR8Pg==)<\|system\|\>Youareahelpfulassistantthathelpspatientsunderstandtheirmedicalrecords\.<\|end\|\><\|user\|\>Wewillpresentyouwithapairofabriefhospitalcourse\(BHC\)andapatientaftervisitsummary\(AVS\)\.TheAVSisalsoreferredtoasdischargesummary\.TheBHCcontainsadetailedsummaryofthehospitalstaywrittenbymedicalservice\.Itusuallycontainsmedicaljargon,anditcanfollowdifferentstructuresbasedonthehospitalcourseandresponsiblemedicalspecialty\.TheAVSsummarizesthehospitalstayforthepatientinplainlanguage\.Inpractice,theBHCisnottheonlysourceofinformationtowritetheAVS\.However,inoursettingwetreattheBHCastheonlycontextforthesummary\.\#\#InstructionsForthislabelingtask,weareinterestedinerrorsintheAVSthatareeitherunsupportedbytheBHC,contradictcontentintheBHC,orarewrongmedicalfacts\.Weallowstatementsthatcontaingeneralmedicalknowledgeoradvicethatareoftenusedinpatientsummaries\.Mosterrorsareduetounsupportedfacts,sowefurtherdistinguishthosebasedontheirspecificcontent\.Thisleadstothefollowingerrortypesorlabels:1\.Unsupportedfacts,includingcondition/procedure/medication/time/location/number/name/word/other2\.Contradictedfact3\.IncorrectfactAndbelowisthedetailedguideline,andwelabelerrorspanswiththe<error\>tag\(e\.g\.<error\>incorrectfact</error\>\)\.\#\#\#DeterminingSpanofErrorsWelabelthesmallestpossibleconsecutivespanthatspecifiestheerrorgiventheBHCasacontext\.Removingfurtherpartsfromthespanwouldremoveimportantinformation\.Ausefulheuristicistoidentifytheminimalspanthatmustbereplacedtoobtainacorrectstatementthatisgrammaticallycorrect\.Forexample\-"Weperformedan<error\>esophageal\-gastro\-duodenoscopy\(EGD\)\.<error\>"whennosuchprocedureisreportedintheBHC\.Thearticle"an"isnotlabeledasanerror\.Whennoprocedureatallwasperformed"performedanesophageal\-gastro\-duodenoscopy\(EGD\)"shouldbelabeledaserrorbecausethereisnosuitablesubstitutefor"esophageal\-gastro\-duodenoscopy\(EGD\)"\.\-"Afterthesurgery,we<error\>transitionedyoutooraloxycodone</error\>\."whentheBHCcontainsnoinformationforsuchatransition\.IfanothermedicationtransitionismentionedintheBHCandmakessenseinthissentenceonly"oraloxycodone"shouldbelabeled\.IfanotheroralmedicationtransitionismentionedintheBHConly"oxycodone"shouldbelabeled\.\-"<error\>Yoursymptomsrespondedwell</error\>\."whennopartofthesentencemakessenseinthegivencontextoftheAVS\.WeallowgeneralmedicalknowledgeandadvicethatisoftenpartoftheAVS\.Usually,theseareinformationthatarenotspecificforthehospitalcoursegivenintheBHC\.Forexample\-"Pleasetakeyourmedicationsasprescribed"containsnoerroreventhoughtheBHCdoesnotcontainthisinstructionbecausethisisgeneralmedicaladvice\.\-"Ifthesymptomsgetworse,pleasecontactyourdoctor"containsnoerrorevenwhentheBHCdoesnotcontainthisfact,sinceitisgeneralmedicalknowledgethatadoctorshouldbeseenforworseningsymptoms\.WetrytoignoregrammaticalerrorsintheBHCandAVS\.Iftheoriginalmeaningcanstillbeinferred\(e\.g\."medictaions"insteadof"medications"\),themostlikelycorrectedformcanbeused\.Ifthemeaningcannotbeinferred,theycanbeignoredintheBHCorlabeledasUnsupportedOtherintheAVS\.Ifasentenceorphraseisrepeated,thenpleasetreatitasyouwouldanyothersentenceandhighlightallerrors\(evenifyoudidsoinaprevioussentence\)\.Forexample\-"PleasetakeTylenol\.PleasetakeTylenol"whenTylenolwasprescribedintheBHC\.\-"Limityour<error\>useofstairs</error\>\.Pleaselimit<error\>useofstairs</error\>"whenmovementwasencouraged\.Togetreliableerrorcountsaspanshouldonlycontainasingleerror\.\-"Youreceived<error\>Tylenol</error\>and<error\>Ciprofloxacin</error\>"whenthereisnoevidenceintheBHCthatthetwomedicationswereadministeredtothepatient\.\-"Youhavea<error\>follow\-upappointmentwithyourPCP</error\>and<error\>yourcardiologist</error\>"whennosuchfollowupismentionedintheBHC\.Botherrorsarelabeledseparately\.\#\#\#DealingwithDeidentifiedInformationThedatacontainsdeidentifiedinformationshownwith"\_\_\_"inthetext\.Wealwaystreatthisasnon\-existentinformation\.So,theannotatorsshouldnotinferwhatthedeidentifiedinformationcouldbe\.Ingeneral,deidentifiedfieldsintheAVSshouldnotbelabeledaserrors\.However,sometimestheybelongtoawrongstatementorclearlycontainunsupportedinformation\(e\.g\.,adoctor’snameorphonenumbers\)thatarenotgivenintheBHC\.Inthesecases,deidentifiedfieldsshouldbeincludedintheerrorspan\.Forexample\-"Take\_\_\_<error\>200mgdaily</error\>andtrytorest"whennosuchdosageinformationisprovidedintheBHC,butthestatementtorest\.Thedeidentifiedmedicationnameisexcludedfromtheerrorspan\.\-"Pleaseavoidgoingup<error\>morethan\_\_\_stairs</error\>atatime"whenrestrictionsforthenumberofstairstakenatatimearenotementionedintheBHC\.\-"<error\>Dr\.\_\_\_willfollowupwithyou</error\>"whennofollow\-upismentionedintheBHC\.\-"PleasestoptakingAspirin<error\>on\_\_\_</error\>"whennostoppingdateisgivenintheBHC\.\-"YourRBCpeaked<error\>at\_\_\_million</error\>"ifthereisnohintofaspecificredbloodcellcountgivenintheBHC\.\#\#\#ErrorTypesIngeneral,weaskforthemostspecificerrorthatisapplicable\.Ifthereisuncertaintywhichtypeapplies,prefertheonementionedfirstintheenumerationofallerrortypesshownearlier\.Forinstance,iftheerrorcontainsanunsupportedmedicationname,theUnsupportedmedicationtypeshouldbeusedinsteadoftheUnsupportednametype\.Hereisadetaileddescriptionoftheerrortypes:\-‘UnsupportedCondition‘:includesunsupportedsymptoms,diseases,orfindingsofthepatient\.Forexample\-"Youwerefoundtohavea<error\>leftclaviclefracture</error\>"whennoinformationwasgivenforthisconditionintheBHC\.\-‘UnsupportedProcedure‘:includesanyunsupportedmedicalprocedures\.Forexample\-"Youhada<error\>filterplacedinyourvein</error\>"whennointerventionwithafilterwasmentioned\.\-‘UnsupportedMedication‘:containsallerrorsrelatedtounsupportedmedications\.Thisincludesmedicationclasses,substances,routes,frequencies,anddosages\.Forexample\-"Youwereplacedon<error\>antibiotics</error\>"whenonlybloodthinnerswereprescribed\.\-‘UnsupportedTime‘:includesallerrorsforunsupportedtimeorintervalstatements\.Forexample\-"Keepyourarminaslingforthe<error\>next6weeks</error\>"whennospecificdurationisgiven\.\-‘UnsupportedLocation‘:Locationsincludebothunsupportedphysicalplacesaswellasregionsofthepatient\.Forexample\-"Thepatientwasadmittedtothe<error\>AcuteSurgeryService</error\>"whennoadmissionlocationwasprovidedintheBHC\.\-‘UnsupportedNumber‘:anynumbereitherasdigitsorwrittenthatareunsupported\.Thisalsoincludeswordssuchas"a"and"an"\.Forexample\-"Yourpacemakerratewasincreasedto<error\>50</error\>"whentherateof50isnotgivenintheBHC\.\-‘UnsupportedName‘:namedentitiesthatarenotsupportedbytheBHC\.Forexample\-"Youwereseenbythe<error\>interventionalpulmonaryservice</error\>"whennoconsultwiththisservicewasmentionedintheBHC\.\-‘UnsupportedWord‘:incorrectorinappropriatewordsorphraseswhichdonotfitinanyoftheabovetypes\.Forexample\-"Wewillsendyouhomewitha<error\>drain</error\>inplace"whendrainnotmentionedintheBHC\.\-‘UnsupportedOther‘:Ifthereisamistakewhichclearlydoesnotbelongtoanyoftheabovecategories,youmayusethiscategoryasalastresort\.Wecannotgivepreciseinstructionsbecausethe"other"categoryisverybroad\.\-‘ContradictedFact‘:ThiserrortypeisindependentofthecontentandcontainsallfactsthatclearlycontradictinformationprovidedintheBHC\.Forexample\-"Yourpacemakerratewasincreasedto<error\>50</error\>"whenthecontextstateapacemakerrateof40\.\-‘IncorrectFact‘:Thiserrortypeisindependentofthecontentandcontainsallfactsthatclearlycontradictgeneralmedicalknowledgeoradvice\.Forexample\-"Wediagnosedaseizure,andyou<error\>cancontinuedrivingyourcar</error\>"whennoreasonforallowingdrivingafteraseizureisprovidedthiscontradictcommonmedicalknowledge\.\{%forexampleindemonstrations%\}\#\#Examples\#\#\#Example\{\{loop\.index\}\}BHC:\{\{example\.context\}\}AVS:\{\{example\.summary\}\}\{%ifexample\.error\_descriptions%\}ERRORS:\{\{example\.error\_descriptions\}\}\{%endif%\}AVSWITHERRORSLABELED:\{\{example\.summary\_with\_errors\}\}\{%endfor%\}\{%ifdemonstrations%\}\#\#\#Example\{\{len\(demonstrations\)\+1\}\}\{%endif%\}BHC:\{\{context\}\}AVS:\{\{summary\}\}<\|end\|\>

Medalign Chain\-of\-thought K\-Shot with Class Explanation and Class Aware Prediction Prompt[⬇](data:text/plain;base64,PHxzeXN0ZW18PgpZb3UgYXJlIGEgaGVscGZ1bCBhc3Npc3RhbnQgdGhhdCBoZWxwcyBwYXRpZW50cyB1bmRlcnN0YW5kIHRoZWlyIG1lZGljYWwgcmVjb3Jkcy4KPHxlbmR8PgoKPHx1c2VyfD4KV2Ugd2lsbCBwcmVzZW50IHlvdSB3aXRoIGEgcGFpciBvZiBhIGJyaWVmIGhvc3BpdGFsIGNvdXJzZSAoQkhDKSBhbmQgYSBwYXRpZW50IGFmdGVyIHZpc2l0IHN1bW1hcnkgKEFWUykuIFRoZSBBVlMgaXMgYWxzbyByZWZlcnJlZCB0byBhcyBkaXNjaGFyZ2Ugc3VtbWFyeS4gVGhlIEJIQyBjb250YWlucyBhIGRldGFpbGVkIHN1bW1hcnkgb2YgdGhlIGhvc3BpdGFsIHN0YXkgd3JpdHRlbiBieSBtZWRpY2FsIHNlcnZpY2UuIEl0IHVzdWFsbHkgY29udGFpbnMgbWVkaWNhbCBqYXJnb24sIGFuZCBpdCBjYW4gZm9sbG93IGRpZmZlcmVudCBzdHJ1Y3R1cmVzIGJhc2VkIG9uIHRoZSBob3NwaXRhbCBjb3Vyc2UgYW5kIHJlc3BvbnNpYmxlIG1lZGljYWwgc3BlY2lhbHR5LiBUaGUgQVZTIHN1bW1hcml6ZXMgdGhlIGhvc3BpdGFsIHN0YXkgZm9yIHRoZSBwYXRpZW50IGluIHBsYWluIGxhbmd1YWdlLiBJbiBwcmFjdGljZSwgdGhlIEJIQyBpcyBub3QgdGhlIG9ubHkgc291cmNlIG9mIGluZm9ybWF0aW9uIHRvIHdyaXRlIHRoZSBBVlMuIEhvd2V2ZXIsIGluIG91ciBzZXR0aW5nIHdlIHRyZWF0IHRoZSBCSEMgYXMgdGhlIG9ubHkgY29udGV4dCBmb3IgdGhlIHN1bW1hcnkuCgojIyBJbnN0cnVjdGlvbnMKCkZvciB0aGlzIGxhYmVsaW5nIHRhc2ssIHdlIGFyZSBpbnRlcmVzdGVkIGluIGVycm9ycyBpbiB0aGUgQVZTIHRoYXQgYXJlIGVpdGhlciB1bnN1cHBvcnRlZCBieSB0aGUgQkhDLCBjb250cmFkaWN0IGNvbnRlbnQgaW4gdGhlIEJIQywgb3IgYXJlIHdyb25nIG1lZGljYWwgZmFjdHMuIFdlIGFsbG93IHN0YXRlbWVudHMgdGhhdCBjb250YWluIGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2Ugb3IgYWR2aWNlIHRoYXQgYXJlIG9mdGVuIHVzZWQgaW4gcGF0aWVudCBzdW1tYXJpZXMuIE1vc3QgZXJyb3JzIGFyZSBkdWUgdG8gdW5zdXBwb3J0ZWQgZmFjdHMsIHNvIHdlIGZ1cnRoZXIgZGlzdGluZ3Vpc2ggdGhvc2UgYmFzZWQgb24gdGhlaXIgc3BlY2lmaWMgY29udGVudC4gVGhpcyBsZWFkcyB0byB0aGUgZm9sbG93aW5nIGVycm9yIHR5cGVzIG9yIGxhYmVsczoKMS4gVW5zdXBwb3J0ZWQgZmFjdHMsIGluY2x1ZGluZyBjb25kaXRpb24vcHJvY2VkdXJlL21lZGljYXRpb24vdGltZS9sb2NhdGlvbi9udW1iZXIvbmFtZS93b3JkL290aGVyCjIuIENvbnRyYWRpY3RlZCBmYWN0CjMuIEluY29ycmVjdCBmYWN0CkFuZCBiZWxvdyBpcyB0aGUgZGV0YWlsZWQgZ3VpZGVsaW5lLCBhbmQgd2UgbGFiZWwgZXJyb3Igc3BhbnMgd2l0aCB0aGUgPGVycm9yPiB0YWcgKGUuZy4gPGVycm9yIGNsYXNzPSJlcnJvcl90eXBlIj5pbmNvcnJlY3QgZmFjdDwvZXJyb3I+KS4KCiMjIyBEZXRlcm1pbmluZyBTcGFuIG9mIEVycm9ycwpXZSBsYWJlbCB0aGUgc21hbGxlc3QgcG9zc2libGUgY29uc2VjdXRpdmUgc3BhbiB0aGF0IHNwZWNpZmllcyB0aGUgZXJyb3IgZ2l2ZW4gdGhlIEJIQyBhcyBhIGNvbnRleHQuIFJlbW92aW5nIGZ1cnRoZXIgcGFydHMgZnJvbSB0aGUgc3BhbiB3b3VsZCByZW1vdmUgaW1wb3J0YW50IGluZm9ybWF0aW9uLiBBIHVzZWZ1bCBoZXVyaXN0aWMgaXMgdG8gaWRlbnRpZnkgdGhlIG1pbmltYWwgc3BhbiB0aGF0IG11c3QgYmUgcmVwbGFjZWQgdG8gb2J0YWluIGEgY29ycmVjdCBzdGF0ZW1lbnQgdGhhdCBpcyBncmFtbWF0aWNhbGx5IGNvcnJlY3QuIEZvciBleGFtcGxlCi0gIldlIHBlcmZvcm1lZCBhbiA8ZXJyb3I+ZXNvcGhhZ2VhbC1nYXN0cm8tZHVvZGVub3Njb3B5IChFR0QpLjxlcnJvcj4iIHdoZW4gbm8gc3VjaCBwcm9jZWR1cmUgaXMgcmVwb3J0ZWQgaW4gdGhlIEJIQy4gVGhlIGFydGljbGUgImFuIiBpcyBub3QgbGFiZWxlZCBhcyBhbiBlcnJvci4gV2hlbiBubyBwcm9jZWR1cmUgYXQgYWxsIHdhcyBwZXJmb3JtZWQgInBlcmZvcm1lZCBhbiBlc29waGFnZWFsLWdhc3Ryby1kdW9kZW5vc2NvcHkgKEVHRCkiIHNob3VsZCBiZSBsYWJlbGVkIGFzIGVycm9yIGJlY2F1c2UgdGhlcmUgaXMgbm8gc3VpdGFibGUgc3Vic3RpdHV0ZSBmb3IgImVzb3BoYWdlYWwtZ2FzdHJvLWR1b2Rlbm9zY29weSAoRUdEKSIuCi0gIkFmdGVyIHRoZSBzdXJnZXJ5LCB3ZSA8ZXJyb3I+dHJhbnNpdGlvbmVkIHlvdSB0byBvcmFsIG94eWNvZG9uZTwvZXJyb3I+LiIgd2hlbiB0aGUgQkhDIGNvbnRhaW5zIG5vIGluZm9ybWF0aW9uIGZvciBzdWNoIGEgdHJhbnNpdGlvbi4gSWYgYW5vdGhlciBtZWRpY2F0aW9uIHRyYW5zaXRpb24gaXMgbWVudGlvbmVkIGluIHRoZSBCSEMgYW5kIG1ha2VzIHNlbnNlIGluIHRoaXMgc2VudGVuY2Ugb25seSAib3JhbCBveHljb2RvbmUiIHNob3VsZCBiZSBsYWJlbGVkLiBJZiBhbm90aGVyIG9yYWwgbWVkaWNhdGlvbiB0cmFuc2l0aW9uIGlzIG1lbnRpb25lZCBpbiB0aGUgQkhDIG9ubHkgIm94eWNvZG9uZSIgc2hvdWxkIGJlIGxhYmVsZWQuCi0gIjxlcnJvcj5Zb3VyIHN5bXB0b21zIHJlc3BvbmRlZCB3ZWxsPC9lcnJvcj4uIiB3aGVuIG5vIHBhcnQgb2YgdGhlIHNlbnRlbmNlIG1ha2VzIHNlbnNlIGluIHRoZSBnaXZlbiBjb250ZXh0IG9mIHRoZSBBVlMuCgpXZSBhbGxvdyBnZW5lcmFsIG1lZGljYWwga25vd2xlZGdlIGFuZCBhZHZpY2UgdGhhdCBpcyBvZnRlbiBwYXJ0IG9mIHRoZSBBVlMuIFVzdWFsbHksIHRoZXNlIGFyZSBpbmZvcm1hdGlvbiB0aGF0IGFyZSBub3Qgc3BlY2lmaWMgZm9yIHRoZSBob3NwaXRhbCBjb3Vyc2UgZ2l2ZW4gaW4gdGhlIEJIQy4gRm9yIGV4YW1wbGUKLSAiUGxlYXNlIHRha2UgeW91ciBtZWRpY2F0aW9ucyBhcyBwcmVzY3JpYmVkIiBjb250YWlucyBubyBlcnJvciBldmVuIHRob3VnaCB0aGUgQkhDIGRvZXMgbm90IGNvbnRhaW4gdGhpcyBpbnN0cnVjdGlvbiBiZWNhdXNlIHRoaXMgaXMgZ2VuZXJhbCBtZWRpY2FsIGFkdmljZS4KLSAiSWYgdGhlIHN5bXB0b21zIGdldCB3b3JzZSwgcGxlYXNlIGNvbnRhY3QgeW91ciBkb2N0b3IiIGNvbnRhaW5zIG5vIGVycm9yIGV2ZW4gd2hlbiB0aGUgQkhDIGRvZXMgbm90IGNvbnRhaW4gdGhpcyBmYWN0LCBzaW5jZSBpdCBpcyBnZW5lcmFsIG1lZGljYWwga25vd2xlZGdlIHRoYXQgYSBkb2N0b3Igc2hvdWxkIGJlIHNlZW4gZm9yIHdvcnNlbmluZyBzeW1wdG9tcy4gCgpXZSB0cnkgdG8gaWdub3JlIGdyYW1tYXRpY2FsIGVycm9ycyBpbiB0aGUgQkhDIGFuZCBBVlMuIElmIHRoZSBvcmlnaW5hbCBtZWFuaW5nIGNhbiBzdGlsbCBiZSBpbmZlcnJlZCAoZS5nLiAibWVkaWN0YWlvbnMiIGluc3RlYWQgb2YgIm1lZGljYXRpb25zIiksIHRoZSBtb3N0IGxpa2VseSBjb3JyZWN0ZWQgZm9ybSBjYW4gYmUgdXNlZC4gSWYgdGhlIG1lYW5pbmcgY2Fubm90IGJlIGluZmVycmVkLCB0aGV5IGNhbiBiZSBpZ25vcmVkIGluIHRoZSBCSEMgb3IgbGFiZWxlZCBhcyBVbnN1cHBvcnRlZCBPdGhlciBpbiB0aGUgQVZTLgoKSWYgYSBzZW50ZW5jZSBvciBwaHJhc2UgaXMgcmVwZWF0ZWQsIHRoZW4gcGxlYXNlIHRyZWF0IGl0IGFzIHlvdSB3b3VsZCBhbnkgb3RoZXIgc2VudGVuY2UgYW5kIGhpZ2hsaWdodCBhbGwgZXJyb3JzIChldmVuIGlmIHlvdSBkaWQgc28gaW4gYSBwcmV2aW91cyBzZW50ZW5jZSkuIEZvciBleGFtcGxlCi0gIlBsZWFzZSB0YWtlIFR5bGVub2wuIFBsZWFzZSB0YWtlIFR5bGVub2wiIHdoZW4gVHlsZW5vbCB3YXMgcHJlc2NyaWJlZCBpbiB0aGUgQkhDLgotICJMaW1pdCB5b3VyIDxlcnJvcj51c2Ugb2Ygc3RhaXJzPC9lcnJvcj4uIFBsZWFzZSBsaW1pdCA8ZXJyb3I+dXNlIG9mIHN0YWlyczwvZXJyb3I+IiB3aGVuIG1vdmVtZW50IHdhcyBlbmNvdXJhZ2VkLgoKVG8gZ2V0IHJlbGlhYmxlIGVycm9yIGNvdW50cyBhIHNwYW4gc2hvdWxkIG9ubHkgY29udGFpbiBhIHNpbmdsZSBlcnJvci4KLSAiWW91IHJlY2VpdmVkIDxlcnJvcj5UeWxlbm9sPC9lcnJvcj4gYW5kIDxlcnJvcj5DaXByb2Zsb3hhY2luPC9lcnJvcj4iIHdoZW4gdGhlcmUgaXMgbm8gZXZpZGVuY2UgaW4gdGhlIEJIQyB0aGF0IHRoZSB0d28gbWVkaWNhdGlvbnMgd2VyZSBhZG1pbmlzdGVyZWQgdG8gdGhlIHBhdGllbnQuCi0gIllvdSBoYXZlIGEgPGVycm9yPmZvbGxvdy11cCBhcHBvaW50bWVudCB3aXRoIHlvdXIgUENQPC9lcnJvcj4gYW5kIDxlcnJvcj55b3VyIGNhcmRpb2xvZ2lzdDwvZXJyb3I+IiB3aGVuIG5vIHN1Y2ggZm9sbG93IHVwIGlzIG1lbnRpb25lZCBpbiB0aGUgQkhDLiBCb3RoIGVycm9ycyBhcmUgbGFiZWxlZCBzZXBhcmF0ZWx5LgoKIyMjIERlYWxpbmcgd2l0aCBEZWlkZW50aWZpZWQgSW5mb3JtYXRpb24KVGhlIGRhdGEgY29udGFpbnMgZGVpZGVudGlmaWVkIGluZm9ybWF0aW9uIHNob3duIHdpdGggIl9fXyIgaW4gdGhlIHRleHQuIFdlIGFsd2F5cyB0cmVhdCB0aGlzIGFzIG5vbi1leGlzdGVudCBpbmZvcm1hdGlvbi4gU28sIHRoZSBhbm5vdGF0b3JzIHNob3VsZCBub3QgaW5mZXIgd2hhdCB0aGUgZGVpZGVudGlmaWVkIGluZm9ybWF0aW9uIGNvdWxkIGJlLiBJbiBnZW5lcmFsLCBkZWlkZW50aWZpZWQgZmllbGRzIGluIHRoZSBBVlMgc2hvdWxkIG5vdCBiZSBsYWJlbGVkIGFzIGVycm9ycy4gSG93ZXZlciwgc29tZXRpbWVzIHRoZXkgYmVsb25nIHRvIGEgd3Jvbmcgc3RhdGVtZW50IG9yIGNsZWFybHkgY29udGFpbiB1bnN1cHBvcnRlZCBpbmZvcm1hdGlvbiAoZS5nLiwgYSBkb2N0b3IncyBuYW1lIG9yIHBob25lIG51bWJlcnMpIHRoYXQgYXJlIG5vdCBnaXZlbiBpbiB0aGUgQkhDLiBJbiB0aGVzZSBjYXNlcywgZGVpZGVudGlmaWVkIGZpZWxkcyBzaG91bGQgYmUgaW5jbHVkZWQgaW4gdGhlIGVycm9yIHNwYW4uIEZvciBleGFtcGxlCi0gIlRha2UgX19fIDxlcnJvcj4yMDBtZyBkYWlseTwvZXJyb3I+IGFuZCB0cnkgdG8gcmVzdCIgd2hlbiBubyBzdWNoIGRvc2FnZSBpbmZvcm1hdGlvbiBpcyBwcm92aWRlZCBpbiB0aGUgQkhDLCBidXQgdGhlIHN0YXRlbWVudCB0byByZXN0LiBUaGUgZGVpZGVudGlmaWVkIG1lZGljYXRpb24gbmFtZSBpcyBleGNsdWRlZCBmcm9tIHRoZSBlcnJvciBzcGFuLgotICJQbGVhc2UgYXZvaWQgZ29pbmcgdXAgPGVycm9yPm1vcmUgdGhhbiBfX18gc3RhaXJzPC9lcnJvcj4gYXQgYSB0aW1lIiB3aGVuIHJlc3RyaWN0aW9ucyBmb3IgdGhlIG51bWJlciBvZiBzdGFpcnMgdGFrZW4gYXQgYSB0aW1lIGFyZSBub3RlIG1lbnRpb25lZCBpbiB0aGUgQkhDLgotICI8ZXJyb3I+RHIuIF9fXyB3aWxsIGZvbGxvdyB1cCB3aXRoIHlvdTwvZXJyb3I+IiB3aGVuIG5vIGZvbGxvdy11cCBpcyBtZW50aW9uZWQgaW4gdGhlIEJIQy4KLSAiUGxlYXNlIHN0b3AgdGFraW5nIEFzcGlyaW4gPGVycm9yPm9uIF9fXzwvZXJyb3I+IiB3aGVuIG5vIHN0b3BwaW5nIGRhdGUgaXMgZ2l2ZW4gaW4gdGhlIEJIQy4gCi0gIllvdXIgUkJDIHBlYWtlZCA8ZXJyb3I+YXQgX19fIG1pbGxpb248L2Vycm9yPiIgaWYgdGhlcmUgaXMgbm8gaGludCBvZiBhIHNwZWNpZmljIHJlZCBibG9vZCBjZWxsIGNvdW50IGdpdmVuIGluIHRoZSBCSEMuCgojIyMgRXJyb3IgVHlwZXMKSW4gZ2VuZXJhbCwgd2UgYXNrIGZvciB0aGUgbW9zdCBzcGVjaWZpYyBlcnJvciB0aGF0IGlzIGFwcGxpY2FibGUuIElmIHRoZXJlIGlzIHVuY2VydGFpbnR5IHdoaWNoIHR5cGUgYXBwbGllcywgcHJlZmVyIHRoZSBvbmUgbWVudGlvbmVkIGZpcnN0IGluIHRoZSBlbnVtZXJhdGlvbiBvZiBhbGwgZXJyb3IgdHlwZXMgc2hvd24gZWFybGllci4gRm9yIGluc3RhbmNlLCBpZiB0aGUgZXJyb3IgY29udGFpbnMgYW4gdW5zdXBwb3J0ZWQgbWVkaWNhdGlvbiBuYW1lLCB0aGUgVW5zdXBwb3J0ZWQgbWVkaWNhdGlvbiB0eXBlIHNob3VsZCBiZSB1c2VkIGluc3RlYWQgb2YgdGhlIFVuc3VwcG9ydGVkIG5hbWUgdHlwZS4gSGVyZSBpcyBhIGRldGFpbGVkIGRlc2NyaXB0aW9uIG9mIHRoZSBlcnJvciB0eXBlczoKLSBgVW5zdXBwb3J0ZWQgQ29uZGl0aW9uYDogaW5jbHVkZXMgdW5zdXBwb3J0ZWQgc3ltcHRvbXMsIGRpc2Vhc2VzLCBvciBmaW5kaW5ncyBvZiB0aGUgcGF0aWVudC4gRm9yIGV4YW1wbGUKICAgIC0gIllvdSB3ZXJlIGZvdW5kIHRvIGhhdmUgYSA8ZXJyb3IgY2xhc3M9InVuc3VwcG9ydGVkX2NvbmRpdGlvbiI+bGVmdCBjbGF2aWNsZSBmcmFjdHVyZTwvZXJyb3I+IiB3aGVuIG5vIGluZm9ybWF0aW9uIHdhcyBnaXZlbiBmb3IgdGhpcyBjb25kaXRpb24gaW4gdGhlIEJIQy4KLSBgVW5zdXBwb3J0ZWQgUHJvY2VkdXJlYDogaW5jbHVkZXMgYW55IHVuc3VwcG9ydGVkIG1lZGljYWwgcHJvY2VkdXJlcy4gRm9yIGV4YW1wbGUKICAgIC0gIllvdSBoYWQgYSA8ZXJyb3IgY2xhc3M9InVuc3VwcG9ydGVkX3Byb2NlZHVyZSI+ZmlsdGVyIHBsYWNlZCBpbiB5b3VyIHZlaW48L2Vycm9yPiIgd2hlbiBubyBpbnRlcnZlbnRpb24gd2l0aCBhIGZpbHRlciB3YXMgbWVudGlvbmVkLgotIGBVbnN1cHBvcnRlZCBNZWRpY2F0aW9uYDogY29udGFpbnMgYWxsIGVycm9ycyByZWxhdGVkIHRvIHVuc3VwcG9ydGVkIG1lZGljYXRpb25zLiBUaGlzIGluY2x1ZGVzIG1lZGljYXRpb24gY2xhc3Nlcywgc3Vic3RhbmNlcywgcm91dGVzLCBmcmVxdWVuY2llcywgYW5kIGRvc2FnZXMuIEZvciBleGFtcGxlCiAgICAtICJZb3Ugd2VyZSBwbGFjZWQgb24gPGVycm9yIGNsYXNzPSJ1bnN1cHBvcnRlZF9tZWRpY2F0aW9uIj5hbnRpYmlvdGljczwvZXJyb3I+IiB3aGVuIG9ubHkgYmxvb2QgdGhpbm5lcnMgd2VyZSBwcmVzY3JpYmVkLgotIGBVbnN1cHBvcnRlZCBUaW1lYDogaW5jbHVkZXMgYWxsIGVycm9ycyBmb3IgdW5zdXBwb3J0ZWQgdGltZSBvciBpbnRlcnZhbCBzdGF0ZW1lbnRzLiBGb3IgZXhhbXBsZQogICAgLSAiS2VlcCB5b3VyIGFybSBpbiBhIHNsaW5nIGZvciB0aGUgPGVycm9yIGNsYXNzPSJ1bnN1cHBvcnRlZF90aW1lIj5uZXh0IDYgd2Vla3M8L2Vycm9yPiIgd2hlbiBubyBzcGVjaWZpYyBkdXJhdGlvbiBpcyBnaXZlbi4KLSBgVW5zdXBwb3J0ZWQgTG9jYXRpb25gOiBMb2NhdGlvbnMgaW5jbHVkZSBib3RoIHVuc3VwcG9ydGVkIHBoeXNpY2FsIHBsYWNlcyBhcyB3ZWxsIGFzIHJlZ2lvbnMgb2YgdGhlIHBhdGllbnQuIEZvciBleGFtcGxlCiAgICAtICJUaGUgcGF0aWVudCB3YXMgYWRtaXR0ZWQgdG8gdGhlIDxlcnJvciBjbGFzcz0idW5zdXBwb3J0ZWRfbG9jYXRpb24iPkFjdXRlIFN1cmdlcnkgU2VydmljZTwvZXJyb3I+IiB3aGVuIG5vIGFkbWlzc2lvbiBsb2NhdGlvbiB3YXMgcHJvdmlkZWQgaW4gdGhlIEJIQy4KLSBgVW5zdXBwb3J0ZWQgTnVtYmVyYDogYW55IG51bWJlciBlaXRoZXIgYXMgZGlnaXRzIG9yIHdyaXR0ZW4gdGhhdCBhcmUgdW5zdXBwb3J0ZWQuIFRoaXMgYWxzbyBpbmNsdWRlcyB3b3JkcyBzdWNoIGFzICJhIiBhbmQgImFuIi4gRm9yIGV4YW1wbGUKICAgIC0gIllvdXIgcGFjZW1ha2VyIHJhdGUgd2FzIGluY3JlYXNlZCB0byA8ZXJyb3IgY2xhc3M9InVuc3VwcG9ydGVkX251bWJlciI+NTA8L2Vycm9yPiIgd2hlbiB0aGUgcmF0ZSBvZiA1MCBpcyBub3QgZ2l2ZW4gaW4gdGhlIEJIQy4KLSBgVW5zdXBwb3J0ZWQgTmFtZWA6IG5hbWVkIGVudGl0aWVzIHRoYXQgYXJlIG5vdCBzdXBwb3J0ZWQgYnkgdGhlIEJIQy4gRm9yIGV4YW1wbGUKICAgIC0gIllvdSB3ZXJlIHNlZW4gYnkgdGhlIDxlcnJvciBjbGFzcz0idW5zdXBwb3J0ZWRfbmFtZSI+aW50ZXJ2ZW50aW9uYWwgcHVsbW9uYXJ5IHNlcnZpY2U8L2Vycm9yPiIgd2hlbiBubyBjb25zdWx0IHdpdGggdGhpcyBzZXJ2aWNlIHdhcyBtZW50aW9uZWQgaW4gdGhlIEJIQy4KLSBgVW5zdXBwb3J0ZWQgV29yZGA6IGluY29ycmVjdCBvciBpbmFwcHJvcHJpYXRlIHdvcmRzIG9yIHBocmFzZXMgd2hpY2ggZG8gbm90IGZpdCBpbiBhbnkgb2YgdGhlIGFib3ZlIHR5cGVzLiBGb3IgZXhhbXBsZQogICAgLSAiV2Ugd2lsbCBzZW5kIHlvdSBob21lIHdpdGggYSA8ZXJyb3IgY2xhc3M9InVuc3VwcG9ydGVkX3dvcmQiPmRyYWluPC9lcnJvcj4gaW4gcGxhY2UiIHdoZW4gZHJhaW4gbm90IG1lbnRpb25lZCBpbiB0aGUgQkhDLgotIGBVbnN1cHBvcnRlZCBPdGhlcmA6IElmIHRoZXJlIGlzIGEgbWlzdGFrZSB3aGljaCBjbGVhcmx5IGRvZXMgbm90IGJlbG9uZyB0byBhbnkgb2YgdGhlIGFib3ZlIGNhdGVnb3JpZXMsIHlvdSBtYXkgdXNlIHRoaXMgY2F0ZWdvcnkgYXMgYSBsYXN0IHJlc29ydC4gV2UgY2Fubm90IGdpdmUgcHJlY2lzZSBpbnN0cnVjdGlvbnMgYmVjYXVzZSB0aGUgIm90aGVyIiBjYXRlZ29yeSBpcyB2ZXJ5IGJyb2FkLgotIGBDb250cmFkaWN0ZWQgRmFjdGA6IFRoaXMgZXJyb3IgdHlwZSBpcyBpbmRlcGVuZGVudCBvZiB0aGUgY29udGVudCBhbmQgY29udGFpbnMgYWxsIGZhY3RzIHRoYXQgY2xlYXJseSBjb250cmFkaWN0IGluZm9ybWF0aW9uIHByb3ZpZGVkIGluIHRoZSBCSEMuIEZvciBleGFtcGxlCiAgICAtICJZb3VyIHBhY2VtYWtlciByYXRlIHdhcyBpbmNyZWFzZWQgdG8gPGVycm9yIGNsYXNzPSJjb250cmFkaWN0ZWRfZmFjdCI+NTA8L2Vycm9yPiIgd2hlbiB0aGUgY29udGV4dCBzdGF0ZSBhIHBhY2VtYWtlciByYXRlIG9mIDQwLgotIGBJbmNvcnJlY3QgRmFjdGA6IFRoaXMgZXJyb3IgdHlwZSBpcyBpbmRlcGVuZGVudCBvZiB0aGUgY29udGVudCBhbmQgY29udGFpbnMgYWxsIGZhY3RzIHRoYXQgY2xlYXJseSBjb250cmFkaWN0IGdlbmVyYWwgbWVkaWNhbCBrbm93bGVkZ2Ugb3IgYWR2aWNlLiBGb3IgZXhhbXBsZQogICAgLSAiV2UgZGlhZ25vc2VkIGEgc2VpenVyZSwgYW5kIHlvdSA8ZXJyb3IgY2xhc3M9ImluY29ycmVjdF9mYWN0Ij5jYW4gY29udGludWUgZHJpdmluZyB5b3VyIGNhcjwvZXJyb3I+IiB3aGVuIG5vIHJlYXNvbiBmb3IgYWxsb3dpbmcgZHJpdmluZyBhZnRlciBhIHNlaXp1cmUgaXMgcHJvdmlkZWQgdGhpcyBjb250cmFkaWN0IGNvbW1vbiBtZWRpY2FsIGtub3dsZWRnZS4KCnslIGZvciBleGFtcGxlIGluIGRlbW9uc3RyYXRpb25zICV9CiMjIEV4YW1wbGVzCiMjIyBFeGFtcGxlIHt7IGxvb3AuaW5kZXggfX0KCkJIQzoKe3sgZXhhbXBsZS5jb250ZXh0IH19CgpBVlM6Cnt7IGV4YW1wbGUuc3VtbWFyeSB9fQp7JSBpZiBleGFtcGxlLmVycm9yX2Rlc2NyaXB0aW9ucyAlfQoKRVJST1JTOgp7eyBleGFtcGxlLmVycm9yX2Rlc2NyaXB0aW9ucyB9fQoKeyUgZW5kaWYgJX0KQVZTIFdJVEggRVJST1JTIExBQkVMRUQ6Cnt7IGV4YW1wbGUuc3VtbWFyeV93aXRoX2Vycm9ycyB9fQoKeyUgZW5kZm9yICV9CnslIGlmIGRlbW9uc3RyYXRpb25zICV9CiMjIyBFeGFtcGxlIHt7IGxlbihkZW1vbnN0cmF0aW9ucykgKyAxIH19Cgp7JSBlbmRpZiAlfQpCSEM6Cnt7IGNvbnRleHQgfX0KCkFWUzoKe3sgc3VtbWFyeSB9fQo8fGVuZHw+)<\|system\|\>Youareahelpfulassistantthathelpspatientsunderstandtheirmedicalrecords\.<\|end\|\><\|user\|\>Wewillpresentyouwithapairofabriefhospitalcourse\(BHC\)andapatientaftervisitsummary\(AVS\)\.TheAVSisalsoreferredtoasdischargesummary\.TheBHCcontainsadetailedsummaryofthehospitalstaywrittenbymedicalservice\.Itusuallycontainsmedicaljargon,anditcanfollowdifferentstructuresbasedonthehospitalcourseandresponsiblemedicalspecialty\.TheAVSsummarizesthehospitalstayforthepatientinplainlanguage\.Inpractice,theBHCisnottheonlysourceofinformationtowritetheAVS\.However,inoursettingwetreattheBHCastheonlycontextforthesummary\.\#\#InstructionsForthislabelingtask,weareinterestedinerrorsintheAVSthatareeitherunsupportedbytheBHC,contradictcontentintheBHC,orarewrongmedicalfacts\.Weallowstatementsthatcontaingeneralmedicalknowledgeoradvicethatareoftenusedinpatientsummaries\.Mosterrorsareduetounsupportedfacts,sowefurtherdistinguishthosebasedontheirspecificcontent\.Thisleadstothefollowingerrortypesorlabels:1\.Unsupportedfacts,includingcondition/procedure/medication/time/location/number/name/word/other2\.Contradictedfact3\.IncorrectfactAndbelowisthedetailedguideline,andwelabelerrorspanswiththe<error\>tag\(e\.g\.<errorclass="error\_type"\>incorrectfact</error\>\)\.\#\#\#DeterminingSpanofErrorsWelabelthesmallestpossibleconsecutivespanthatspecifiestheerrorgiventheBHCasacontext\.Removingfurtherpartsfromthespanwouldremoveimportantinformation\.Ausefulheuristicistoidentifytheminimalspanthatmustbereplacedtoobtainacorrectstatementthatisgrammaticallycorrect\.Forexample\-"Weperformedan<error\>esophageal\-gastro\-duodenoscopy\(EGD\)\.<error\>"whennosuchprocedureisreportedintheBHC\.Thearticle"an"isnotlabeledasanerror\.Whennoprocedureatallwasperformed"performedanesophageal\-gastro\-duodenoscopy\(EGD\)"shouldbelabeledaserrorbecausethereisnosuitablesubstitutefor"esophageal\-gastro\-duodenoscopy\(EGD\)"\.\-"Afterthesurgery,we<error\>transitionedyoutooraloxycodone</error\>\."whentheBHCcontainsnoinformationforsuchatransition\.IfanothermedicationtransitionismentionedintheBHCandmakessenseinthissentenceonly"oraloxycodone"shouldbelabeled\.IfanotheroralmedicationtransitionismentionedintheBHConly"oxycodone"shouldbelabeled\.\-"<error\>Yoursymptomsrespondedwell</error\>\."whennopartofthesentencemakessenseinthegivencontextoftheAVS\.WeallowgeneralmedicalknowledgeandadvicethatisoftenpartoftheAVS\.Usually,theseareinformationthatarenotspecificforthehospitalcoursegivenintheBHC\.Forexample\-"Pleasetakeyourmedicationsasprescribed"containsnoerroreventhoughtheBHCdoesnotcontainthisinstructionbecausethisisgeneralmedicaladvice\.\-"Ifthesymptomsgetworse,pleasecontactyourdoctor"containsnoerrorevenwhentheBHCdoesnotcontainthisfact,sinceitisgeneralmedicalknowledgethatadoctorshouldbeseenforworseningsymptoms\.WetrytoignoregrammaticalerrorsintheBHCandAVS\.Iftheoriginalmeaningcanstillbeinferred\(e\.g\."medictaions"insteadof"medications"\),themostlikelycorrectedformcanbeused\.Ifthemeaningcannotbeinferred,theycanbeignoredintheBHCorlabeledasUnsupportedOtherintheAVS\.Ifasentenceorphraseisrepeated,thenpleasetreatitasyouwouldanyothersentenceandhighlightallerrors\(evenifyoudidsoinaprevioussentence\)\.Forexample\-"PleasetakeTylenol\.PleasetakeTylenol"whenTylenolwasprescribedintheBHC\.\-"Limityour<error\>useofstairs</error\>\.Pleaselimit<error\>useofstairs</error\>"whenmovementwasencouraged\.Togetreliableerrorcountsaspanshouldonlycontainasingleerror\.\-"Youreceived<error\>Tylenol</error\>and<error\>Ciprofloxacin</error\>"whenthereisnoevidenceintheBHCthatthetwomedicationswereadministeredtothepatient\.\-"Youhavea<error\>follow\-upappointmentwithyourPCP</error\>and<error\>yourcardiologist</error\>"whennosuchfollowupismentionedintheBHC\.Botherrorsarelabeledseparately\.\#\#\#DealingwithDeidentifiedInformationThedatacontainsdeidentifiedinformationshownwith"\_\_\_"inthetext\.Wealwaystreatthisasnon\-existentinformation\.So,theannotatorsshouldnotinferwhatthedeidentifiedinformationcouldbe\.Ingeneral,deidentifiedfieldsintheAVSshouldnotbelabeledaserrors\.However,sometimestheybelongtoawrongstatementorclearlycontainunsupportedinformation\(e\.g\.,adoctor’snameorphonenumbers\)thatarenotgivenintheBHC\.Inthesecases,deidentifiedfieldsshouldbeincludedintheerrorspan\.Forexample\-"Take\_\_\_<error\>200mgdaily</error\>andtrytorest"whennosuchdosageinformationisprovidedintheBHC,butthestatementtorest\.Thedeidentifiedmedicationnameisexcludedfromtheerrorspan\.\-"Pleaseavoidgoingup<error\>morethan\_\_\_stairs</error\>atatime"whenrestrictionsforthenumberofstairstakenatatimearenotementionedintheBHC\.\-"<error\>Dr\.\_\_\_willfollowupwithyou</error\>"whennofollow\-upismentionedintheBHC\.\-"PleasestoptakingAspirin<error\>on\_\_\_</error\>"whennostoppingdateisgivenintheBHC\.\-"YourRBCpeaked<error\>at\_\_\_million</error\>"ifthereisnohintofaspecificredbloodcellcountgivenintheBHC\.\#\#\#ErrorTypesIngeneral,weaskforthemostspecificerrorthatisapplicable\.Ifthereisuncertaintywhichtypeapplies,prefertheonementionedfirstintheenumerationofallerrortypesshownearlier\.Forinstance,iftheerrorcontainsanunsupportedmedicationname,theUnsupportedmedicationtypeshouldbeusedinsteadoftheUnsupportednametype\.Hereisadetaileddescriptionoftheerrortypes:\-‘UnsupportedCondition‘:includesunsupportedsymptoms,diseases,orfindingsofthepatient\.Forexample\-"Youwerefoundtohavea<errorclass="unsupported\_condition"\>leftclaviclefracture</error\>"whennoinformationwasgivenforthisconditionintheBHC\.\-‘UnsupportedProcedure‘:includesanyunsupportedmedicalprocedures\.Forexample\-"Youhada<errorclass="unsupported\_procedure"\>filterplacedinyourvein</error\>"whennointerventionwithafilterwasmentioned\.\-‘UnsupportedMedication‘:containsallerrorsrelatedtounsupportedmedications\.Thisincludesmedicationclasses,substances,routes,frequencies,anddosages\.Forexample\-"Youwereplacedon<errorclass="unsupported\_medication"\>antibiotics</error\>"whenonlybloodthinnerswereprescribed\.\-‘UnsupportedTime‘:includesallerrorsforunsupportedtimeorintervalstatements\.Forexample\-"Keepyourarminaslingforthe<errorclass="unsupported\_time"\>next6weeks</error\>"whennospecificdurationisgiven\.\-‘UnsupportedLocation‘:Locationsincludebothunsupportedphysicalplacesaswellasregionsofthepatient\.Forexample\-"Thepatientwasadmittedtothe<errorclass="unsupported\_location"\>AcuteSurgeryService</error\>"whennoadmissionlocationwasprovidedintheBHC\.\-‘UnsupportedNumber‘:anynumbereitherasdigitsorwrittenthatareunsupported\.Thisalsoincludeswordssuchas"a"and"an"\.Forexample\-"Yourpacemakerratewasincreasedto<errorclass="unsupported\_number"\>50</error\>"whentherateof50isnotgivenintheBHC\.\-‘UnsupportedName‘:namedentitiesthatarenotsupportedbytheBHC\.Forexample\-"Youwereseenbythe<errorclass="unsupported\_name"\>interventionalpulmonaryservice</error\>"whennoconsultwiththisservicewasmentionedintheBHC\.\-‘UnsupportedWord‘:incorrectorinappropriatewordsorphraseswhichdonotfitinanyoftheabovetypes\.Forexample\-"Wewillsendyouhomewitha<errorclass="unsupported\_word"\>drain</error\>inplace"whendrainnotmentionedintheBHC\.\-‘UnsupportedOther‘:Ifthereisamistakewhichclearlydoesnotbelongtoanyoftheabovecategories,youmayusethiscategoryasalastresort\.Wecannotgivepreciseinstructionsbecausethe"other"categoryisverybroad\.\-‘ContradictedFact‘:ThiserrortypeisindependentofthecontentandcontainsallfactsthatclearlycontradictinformationprovidedintheBHC\.Forexample\-"Yourpacemakerratewasincreasedto<errorclass="contradicted\_fact"\>50</error\>"whenthecontextstateapacemakerrateof40\.\-‘IncorrectFact‘:Thiserrortypeisindependentofthecontentandcontainsallfactsthatclearlycontradictgeneralmedicalknowledgeoradvice\.Forexample\-"Wediagnosedaseizure,andyou<errorclass="incorrect\_fact"\>cancontinuedrivingyourcar</error\>"whennoreasonforallowingdrivingafteraseizureisprovidedthiscontradictcommonmedicalknowledge\.\{%forexampleindemonstrations%\}\#\#Examples\#\#\#Example\{\{loop\.index\}\}BHC:\{\{example\.context\}\}AVS:\{\{example\.summary\}\}\{%ifexample\.error\_descriptions%\}ERRORS:\{\{example\.error\_descriptions\}\}\{%endif%\}AVSWITHERRORSLABELED:\{\{example\.summary\_with\_errors\}\}\{%endfor%\}\{%ifdemonstrations%\}\#\#\#Example\{\{len\(demonstrations\)\+1\}\}\{%endif%\}BHC:\{\{context\}\}AVS:\{\{summary\}\}<\|end\|\>

Similar Articles

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

arXiv cs.CL

RAGognizer introduces a hallucination-aware fine-tuning approach that integrates a lightweight detection head into LLMs for joint optimization of language modeling and hallucination detection in RAG systems. The paper presents RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and demonstrates state-of-the-art hallucination detection while reducing hallucination rates without degrading language quality.

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

arXiv cs.CL

This paper reveals that much of the reported progress in LLM hallucination detection is due to benchmark construction artifacts, where ground-truth answers are embedded in prompts, allowing a simple text-similarity baseline to achieve near-perfect scores. Through a large-scale controlled evaluation, the authors show that most methods perform near chance under proper controls, except for supervised probes on upper-layer hidden states such as SAPLMA and their proposed DRIFT.