A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions

arXiv cs.CL 06/15/26, 04:00 AM Papers
clinical-nlp bias-audit representational-bias clinicalbert health-equity nlp-fairness
Summary
This paper presents a computational audit of representational bias in ClinicalBERT, finding that demographic associations are amplified by the model itself rather than inherited from training data.
arXiv:2606.14460v1 Announce Type: new Abstract: Transformer-based clinical language models are increasingly integrated into high-stakes clinical decision support pipelines, yet the computational mechanisms through which demographic associations encoded in medical documentation propagate into model probability distributions remain empirically underspecified. We present a systematic computational audit of representational bias in ClinicalBERT (Alsentzer et al., 2019), a BERT-based model pretrained on MIMIC-III discharge summaries, employing two complementary probing methodologies: Log Probability Bias Analysis (LPBA), which quantifies demographic descriptor-induced shifts in masked token probability distributions across behavioral and evaluative semantic categories, and Masked Language Model-based analysis (MLM), which probes internal representational structure for demographic agency attribution encoding across 98 real clinical sentence templates and eight intersectional race-gender combinations. Corpus frequency analysis operationalizes the distinction between statistical disparity and bias amplification by benchmarking model outputs against empirical term frequencies in the MIMIC-III training corpus. Of 32 statistically significant findings, 65.6% contradict observed corpus distributions, rising to 80% for Black patients and 87.5% for agency attribution under MLM probing, providing direct empirical evidence that representational bias in ClinicalBERT operates predominantly through model-internal amplification rather than training data inheritance. Keywords: natural language processing, clinical documentation, algorithmic auditing, representational bias, health equity 1
Original Article
View Cached Full Text
Cached at: 06/15/26, 08:58 AM
# 1 INTRODUCTION
Source: [https://arxiv.org/html/2606.14460](https://arxiv.org/html/2606.14460)
A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions

Kehinde Temitayo Soetan

Department of Medical Humanities and Social Sciences The Ohio State University soetan\.6@osu\.edu

ABSTRACT

Transformer\-based clinical language models are increasingly integrated into high\-stakes clinical decision support pipelines, yet the computational mechanisms through which demographic associations encoded in medical documentation propagate into model probability distributions remain empirically underspecified\. We present a systematic computational audit of representational bias in ClinicalBERT\(Alsentzer et al\.,[2019](https://arxiv.org/html/2606.14460#bib.bib1)\), a BERT\-based model pretrained on MIMIC\-III discharge summaries, employing two complementary probing methodologies: Log Probability Bias Analysis \(LPBA\), which quantifies demographic descriptor\-induced shifts in masked token probability distributions across behavioral and evaluative semantic categories, and Masked Language Model\-based analysis \(MLM\), which probes internal representational structure for demographic agency attribution encoding across 98 real clinical sentence templates and eight intersectional race\-gender combinations\. Corpus frequency analysis operationalizes the distinction between statistical disparity and bias amplification by benchmarking model outputs against empirical term frequencies in the MIMIC\-III training corpus\. Of 32 statistically significant findings, 65\.6% contradict observed corpus distributions, rising to 80% for Black patients and 87\.5% for agency attribution under MLM probing, providing direct empirical evidence that representational bias in ClinicalBERT operates predominantly through model\-internal amplification rather than training data inheritance\.

Keywords:natural language processing, clinical documentation, algorithmic auditing, representational bias, health equity

Buolamwini and Gebru \([2018](https://arxiv.org/html/2606.14460#bib.bib5)\)demonstrated that commercial facial analysis systems performed well in aggregate but failed disproportionately along demographic lines, with darker\-skinned women bearing error rates exceeding 34% above lighter\-skinned men, revealing that strong overall accuracy actively conceals serious equity problems\.

Institutional bias in clinical practice predates large language models\. For instance,Hoffman et al\. \([2016](https://arxiv.org/html/2606.14460#bib.bib14)\)found that medical students and residents who endorsed false biological beliefs about Black patients were more likely to underestimate their pain and recommend inadequate treatment\. The bias was not incidental; it was documented and traceable to what clinicians had read, learned, internalized, and trained within clinical language\. Clinical language shapes perception, guides decision\-making, and, over time, encodes institutional assumptions into medical practice\. When large language models are trained on the notes, discharge summaries, and records produced within these systems, they learn its language as well\.

This paper presents a computational audit of representational bias in ClinicalBERT\(Alsentzer et al\.,[2019](https://arxiv.org/html/2606.14460#bib.bib1)\), a language model pretrained on MIMIC\-III discharge summaries and clinical notes, examining how demographic descriptors shift model probability distributions across behavioral, evaluative, and agency attribution language\. We operationalize representational harm — defined as damage inflicted through symbolic depiction and categorization of social groups\(Crawford,[2017](https://arxiv.org/html/2606.14460#bib.bib8); Blodgett et al\.,[2020](https://arxiv.org/html/2606.14460#bib.bib3)\)— as an empirical analytical lens connecting computational probability outputs to their clinical and social implications\. Probing 98 real clinical sentence templates across eight intersectional race\-gender combinations, we find that 65\.6% of significant model findings contradict MIMIC\-III corpus distributions, rising to 80% for Black patients and 87\.5% for agency attribution\. This demonstrates that representational bias in ClinicalBERT operates predominantly through model\-internal amplification rather than training data inheritance, with direct implications for bias auditing, clinical AI governance, and equitable deployment\.

The remainder of this paper is organized as follows\. Section[2](https://arxiv.org/html/2606.14460#S2)reviews related work on algorithmic bias in clinical NLP, probing methodologies, and the representational harm framework\. Section[3](https://arxiv.org/html/2606.14460#S3)describes the clinical corpus, model, semantic categories, and formal definitions of statistical disparity and bias amplification\. Section[5](https://arxiv.org/html/2606.14460#S5)presents the probing design, LPBA and MLM methodologies, and corpus frequency analysis approach\. Section[6](https://arxiv.org/html/2606.14460#S6)presents empirical findings across behavioral language, evaluative framing, and agency attribution\. Section[7](https://arxiv.org/html/2606.14460#S7)discusses implications for bias amplification theory, linguistic mechanisms of representational harm, and clinical AI governance\. Section[8](https://arxiv.org/html/2606.14460#S8)concludes\.

## 2RELATED WORK

Algorithmic bias in clinical NLP manifests consequentially across racial, gender, and socioeconomic dimensions\.Obermeyer et al\. \([2019](https://arxiv.org/html/2606.14460#bib.bib18)\)demonstrated that a widely deployed healthcare needs algorithm systematically underestimated requirements for Black patients due to training data biases\. Sharma et al\. \(2025\) attribute such disparities to imbalanced training datasets that overrepresent specific demographics, producing skewed clinical predictions\. Critically, existing work focuses predominantly on outcome\-level disparities rather than the computational mechanisms through which demographic associations are encoded in model probability distributions — the gap this study directly addresses\.

Template\-based log\-probability probing of masked language models represents the methodological foundation of demographic bias detection in transformer architectures\.Kurita et al\. \([2019](https://arxiv.org/html/2606.14460#bib.bib16)\)first demonstrated that BERT assigns systematically different probabilities to occupational terms depending on demographic descriptors, whileZhao et al\. \([2019](https://arxiv.org/html/2606.14460#bib.bib20)\)extended this to racial and ethnic bias, showing that ethnicity\-associated names co\-occur with distinct attribute words in model probability distributions\. Unlike static word embedding models where bias is locatable in fixed vector spaces, transformer\-based models distribute bias across dynamic context\-dependent representations that resist direct inspection\. Critically,Hofmann et al\. \([2024](https://arxiv.org/html/2606.14460#bib.bib13)\)demonstrate that post\-training alignment procedures suppress overt bias signals without eliminating their structural sources — a finding that directly motivates the present study’s focus on model\-internal representational structure rather than surface\-level output behavior\. The present study applies the log\-probability probing methodology ofKurita et al\. \([2019](https://arxiv.org/html/2606.14460#bib.bib16)\)to clinical semantic categories across intersectional race\-gender combinations, extending this approach from general domain bias detection into the high\-stakes clinical NLP domain\.

## 3PROBLEM SETUP

The MIMIC\-III clinical database\(Johnson et al\.,[2016](https://arxiv.org/html/2606.14460#bib.bib15)\)comprises de\-identified health records from over 40,000 patients admitted to Beth Israel Deaconess Medical Center between 2001 and 2012\. This study operates on theNOTEEVENTStable, specifically discharge summaries and caregiver notes, which constitute the primary pretraining corpus of ClinicalBERT and represent the primary site at which demographic\-associated linguistic patterns are encoded into model representations\. Patient demographic variables such as race and gender were extracted from theADMISSIONSandPATIENTStables and merged with clinical notes onSUBJECT\_IDandHADM\_IDkeys, producing a dataset stratified across four racial groups, two gender categories, and eight intersectional demographic combinations, with White Male as the reference group throughout\.

## 4MODEL

We audit ClinicalBERT \(emilyalsentzer/Bio\_ClinicalBERT;Alsentzer et al\.[2019](https://arxiv.org/html/2606.14460#bib.bib1)\), a transformer\-based masked language model developed through domain\-adaptive pretraining of BERT on MIMIC\-III clinical notes\. ClinicalBERT’s masked language modeling objective predicts masked tokens given surrounding bidirectional context:

P\(wi∣w1,…,wi−1,wi\+1,…,wn\)P\(w\_\{i\}\\mid w\_\{1\},\\ldots,w\_\{i\-1\},w\_\{i\+1\},\\ldots,w\_\{n\}\)\(1\)
wherewiw\_\{i\}denotes the masked target token\. This objective enables template\-based log\-probability probing of demographic associations, the methodological foundation of the present study\. The model was loaded using the Hugging Face Transformers library in Python\.

### 4\.1Representational Harm Framework

We operationalize representational harm — defined as damage inflicted through symbolic depiction and categorization of social groups\(Crawford,[2017](https://arxiv.org/html/2606.14460#bib.bib8); Blodgett et al\.,[2020](https://arxiv.org/html/2606.14460#bib.bib3)\)— as the primary analytical lens connecting computational probability outputs to their clinical and social implications\. Formally, let

𝒟∈\{Black Male, Black Female, Hispanic Male, Hispanic Female, Asian Male, Asian Female, White Female\}\\mathcal\{D\}\\in\\\{\\text\{Black Male, Black Female, Hispanic Male, Hispanic Female, Asian Male, Asian Female, White Female\}\\\}\(2\)
denote the set of demographic descriptors under analysis, with White Male serving as the reference groupD0D\_\{0\}\.

For a given target wordw∈β∪ℰ∪αw\\in\\beta\\cup\\mathcal\{E\}\\cup\\alphaand clinical sentence templates, representational harm is operationalized through three analytically distinct dimensions:*stereotyping*, whereP\(w∣D\)P\(w\\mid D\)reflects group\-based associations inconsistent with clinical evidence;*erasure*, whereP\(w∣D\)P\(w\\mid D\)systematically underrepresents attributes for groupDD; and*demeaning*, whereP\(w∣D\)P\(w\\mid D\)encodes evaluative characterizations that negatively frame groupDD\.

### 4\.2Statistical Disparity and Bias Amplification

We operationalize two complementary empirical indicators of representational harm that together constitute the analytical core of this study\. LetfC\(w,D\)f\_\{C\}\(w,D\)denote the corpus frequency of target wordwwin clinical notes for demographic groupDD, and letPM\(w∣D\)P\_\{M\}\(w\\mid D\)denote ClinicalBERT’s masked token probability forwwgiven demographic descriptorDDin an identical sentence context\.

Statistical disparityis defined as the difference in model probability assignments across demographic groups for identical clinical contexts:

ΔS\(w,D\)=PM\(w∣D\)−PM\(w∣D0\)\\Delta\_\{S\}\(w,D\)=P\_\{M\}\(w\\mid D\)\-P\_\{M\}\(w\\mid D\_\{0\}\)\(3\)
whereD0D\_\{0\}denotes the White Male reference group\. A statistically significantΔS\(w,D\)≠0\\Delta\_\{S\}\(w,D\)\\neq 0indicates that demographic identity systematically shifts model predictions for target words in identical clinical contexts\.

Bias amplificationis operationalized as a directional divergence between model probability differences and corpus frequency differences\. For each significant model finding, we compute the sign of the model difference:

sign⁡\(ΔS\(w,Di\)\)=\{\+1ifPM\(w∣Di\)\>PM\(w∣D0\)−1ifPM\(w∣Di\)<PM\(w∣D0\)\\operatorname\{sign\}\(\\Delta\_\{S\}\(w,D\_\{i\}\)\)=\\begin\{cases\}\+1&\\text\{if \}P\_\{M\}\(w\\mid D\_\{i\}\)\>P\_\{M\}\(w\\mid D\_\{0\}\)\\\\ \-1&\\text\{if \}P\_\{M\}\(w\\mid D\_\{i\}\)<P\_\{M\}\(w\\mid D\_\{0\}\)\\end\{cases\}\(4\)
A finding is classified as*bias amplification*— model\-generated rather than data\-inherited — when:

sign⁡\(ΔS\(w,Di\)\)≠sign⁡\(ΔC\(w,Di\)\)\\operatorname\{sign\}\(\\Delta\_\{S\}\(w,D\_\{i\}\)\)\\neq\\operatorname\{sign\}\(\\Delta\_\{C\}\(w,D\_\{i\}\)\)\(5\)
The overall bias amplification rate is defined as the proportion of statistically significant model findings where this directional divergence holds\. The corpus frequency analysis operationalizes this distinction empirically by benchmarkingPM\(w∣D\)P\_\{M\}\(w\\mid D\)againstfC\(w,D\)f\_\{C\}\(w,D\)for all significant target words and demographic groups, enabling direct empirical determination of whether observed representational harm reflects statistical disparity, bias amplification, or both\.

## 5PROBING DESIGN

Both auditing methods operate on a shared set of 98 real clinical sentence templates extracted directly from the MIMIC\-III corpus — specifically discharge summaries and caregiver notes — rather than artificially constructed templates\. This design choice ensures ecological validity by grounding the analysis in the same clinical language on which ClinicalBERT was pretrained\. Each template contains a single demographic descriptor slot instantiated across eight intersectional race\-gender combinations:

𝒟=\{D0,D1,…,D7\}=\{White Male, Black Male, Black Female, Hispanic Male, Hispanic Female, Asian Male, Asian Female, White Female\}\\mathcal\{D\}=\\\{D\_\{0\},D\_\{1\},\\ldots,D\_\{7\}\\\}=\\\{\\text\{White Male, Black Male, Black Female, Hispanic Male, Hispanic Female, Asian Male, Asian Female, White Female\}\\\}\(6\)
producing a fully crossed design of 98 templates×\\times8 demographic combinations for each target word\. Pairedtt\-tests compare log probability distributions across demographic groups within identical sentence contexts, accounting for sentence\-level structure by treating each template as its own unit of comparison\. Statistical significance is set atp<0\.05p<0\.05throughout\. To account for multiple comparisons across target words and demographic groups, all reported findings were additionally evaluated using the Benjamini–Hochberg false discovery rate correction at FDR=0\.05=0\.05\. All 17 statistically significant findings survived correction, confirming robustness\. Prior to conducting pairedtt\-tests, normality of log\-probability difference distributions was verified using the Shapiro–Wilk test for all significant findings; all distributions satisfied the normality assumption \(p\>0\.05p\>0\.05\)\.

### 5\.1Log Probability Bias Analysis \(LPBA\)

LPBA quantifies the extent to which substituting a demographic descriptorDiD\_\{i\}for the reference descriptorD0D\_\{0\}shifts ClinicalBERT’s predicted log\-probability for a target wordw∈β∪ℰw\\in\\beta\\cup\\mathcal\{E\}in an identical clinical sentence context\. Formally, for a sentence templateSSwith masked target position, the log\-probability bias score is defined as:

LPBA\(w,Di,S\)=log⁡PM\(w∣S,Di\)−log⁡PM\(w∣S,D0\)\\text\{LPBA\}\(w,D\_\{i\},S\)=\\log P\_\{M\}\(w\\mid S,D\_\{i\}\)\-\\log P\_\{M\}\(w\\mid S,D\_\{0\}\)\(7\)
wherePM\(w∣S,D\)P\_\{M\}\(w\\mid S,D\)denotes ClinicalBERT’s masked token probability for target wordwwin sentenceSSwith demographic descriptorDD, andD0D\_\{0\}denotes the White Male reference group\. A positive LPBA score indicates higher predicted probability forwwunderDiD\_\{i\}relative toD0D\_\{0\}; a negative score indicates suppression\.

Sentence templates were selected from MIMIC\-III discharge summaries and caregiver notes according to three criteria: the sentence directly describes patient behavior, contains between five and thirty words, and has no MIMIC\-III de\-identification artifacts\. Templates with fewer than five available instances per target word were excluded\.

LPBA observations total 488 rather than 784 because the LPBA analysis was applied only to behavioral language \(β\\beta\) and evaluative framing \(ℰ\\mathcal\{E\}\) target words, whereas MLM was applied to all agency attribution \(α\\alpha\) target words across the same 98 templates\.

### 5\.2Masked Language Model Analysis \(MLM\)

MLM extends the LPBA approach from behavioral and evaluative semantic categories into agency attribution language, applying the same masked token probability framework to a semantically distinct category of clinical language\. While both methods query ClinicalBERT’s final output layer probability distributions, MLM operates on raw masked token probabilities rather than log\-probability differences, enabling direct comparison of absolute probability assignments across demographic groups\. For a sentence templateSSwith masked target position, the MLM probability score is defined as:

MLM\(w,Di,S\)=PM\(w∣S,Di\)\\text\{MLM\}\(w,D\_\{i\},S\)=P\_\{M\}\(w\\mid S,D\_\{i\}\)\(8\)
wherew∈αw\\in\\alphadenotes an agency attribution target word\. Agency attribution terms are organized into three subcategories reflecting distinct constructions of patient causal responsibility:

𝒜resist\\displaystyle\\mathcal\{A\}\_\{\\text\{resist\}\}=\{refused, declined\}\\displaystyle=\\\{\\text\{refused, declined\}\\\}\(9\)𝒜cooperate\\displaystyle\\mathcal\{A\}\_\{\\text\{cooperate\}\}=\{requested, agreed\}\\displaystyle=\\\{\\text\{requested, agreed\}\\\}\(10\)𝒜passive\\displaystyle\\mathcal\{A\}\_\{\\text\{passive\}\}=\{responded, presented\}\\displaystyle=\\\{\\text\{responded, presented\}\\\}\(11\)
where𝒜resist\\mathcal\{A\}\_\{\\text\{resist\}\}encodes active resistance to clinical instructions,𝒜cooperate\\mathcal\{A\}\_\{\\text\{cooperate\}\}encodes active cooperation with clinical instructions, and𝒜passive\\mathcal\{A\}\_\{\\text\{passive\}\}encodes passive receipt of clinical action\. This three\-way subcategorization captures whether the model constructs patients from different demographic groups as active decision\-making subjects or passive objects of clinical action\.

### 5\.3Corpus Frequency Analysis

The corpus frequency analysis operationalizes the distinction between statistical disparityΔS\(w,D\)\\Delta\_\{S\}\(w,D\)and bias amplificationΔA\(w,D\)\\Delta\_\{A\}\(w,D\)defined in Section[4\.2](https://arxiv.org/html/2606.14460#S4.SS2)by benchmarking model probability outputs directly against empirical term frequencies in the MIMIC\-III training corpus\. For each statistically significant target wordw∈β∪ℰ∪αw\\in\\beta\\cup\\mathcal\{E\}\\cup\\alphaand demographic groupDiD\_\{i\}, corpus frequency is computed as:

fC\(w,Di\)=count\(w,Di\)count\(all tokens,Di\)×10,000f\_\{C\}\(w,D\_\{i\}\)=\\frac\{\\text\{count\}\(w,D\_\{i\}\)\}\{\\text\{count\}\(\\text\{all tokens\},\\,D\_\{i\}\)\}\\times 10\{,\}000\(12\)
wherecount\(w,Di\)\\text\{count\}\(w,D\_\{i\}\)denotes the raw frequency of target wordwwin clinical notes for demographic groupDiD\_\{i\}, normalized per 10,000 tokens to ensure comparability across demographic groups with different total documentation volumes\.

ΔC\(w,Di\)=fC\(w,Di\)−fC\(w,D0\)\\Delta\_\{C\}\(w,D\_\{i\}\)=f\_\{C\}\(w,D\_\{i\}\)\-f\_\{C\}\(w,D\_\{0\}\)\(13\)
is compared against the model probability differenceΔS\(w,D\)\\Delta\_\{S\}\(w,D\)from LPBA or MLM analysis\. A finding is classified as*Reflection*ifsign⁡\(ΔS\)=sign⁡\(ΔC\)\\operatorname\{sign\}\(\\Delta\_\{S\}\)=\\operatorname\{sign\}\(\\Delta\_\{C\}\), and as a*Contradiction*ifsign⁡\(ΔS\)≠sign⁡\(ΔC\)\\operatorname\{sign\}\(\\Delta\_\{S\}\)\\neq\\operatorname\{sign\}\(\\Delta\_\{C\}\)\.

## 6FINDINGS

### 6\.1Overview

Across 98 clinical sentence templates and eight intersectional demographic combinations, LPBA produced 488 observations and MLM produced 784 observations\. Of 32 statistically significant model findings \(p<0\.05p<0\.05\), 21 contradicted corpus frequency patterns while only 11 reflected them, yielding an overall contradiction rate of 65\.6% \(21/32\)\. This pattern was most pronounced for Black patients at 80\.0% \(12/15\)\. MLM\-based analysis showed an even higher rate of 87\.5% \(7/8\), indicating that agency attribution bias is predominantly model\-generated rather than data\-inherited\. Table[1](https://arxiv.org/html/2606.14460#S8.T1)summarizes the distribution of significant findings across demographic groups and semantic categories\.

### 6\.2LPBA Findings: Behavioral Language and Evaluative Framing

Table[2](https://arxiv.org/html/2606.14460#S8.T2)reports all statistically significant LPBA findings acrossβ\\betaandℰ\\mathcal\{E\}target words\. The most consistent finding concerns the behavioral termw=agitatedw=\\textit\{agitated\}, which produced significantLPBA\(w,Di,S\)\\text\{LPBA\}\(w,D\_\{i\},S\)scores across all three minority groups but in opposing directions:

LPBA\(agitated,DBF\)\\displaystyle\\text\{LPBA\}\(\\textit\{agitated\},\\,D\_\{BF\}\)<0\(t=3\.924,p=0\.004\)\\displaystyle<0\\quad\(t=3\.924,\\ p=0\.004\)\(14\)LPBA\(agitated,DBM\)\\displaystyle\\text\{LPBA\}\(\\textit\{agitated\},\\,D\_\{BM\}\)<0\(t=3\.490,p=0\.008\)\\displaystyle<0\\quad\(t=3\.490,\\ p=0\.008\)\(15\)LPBA\(agitated,DHM\)\\displaystyle\\text\{LPBA\}\(\\textit\{agitated\},\\,D\_\{HM\}\)\>0\(t=−2\.615,p=0\.031\)\\displaystyle\>0\\quad\(t=\-2\.615,\\ p=0\.031\)\(16\)LPBA\(agitated,DAM\)\\displaystyle\\text\{LPBA\}\(\\textit\{agitated\},\\,D\_\{AM\}\)\>0\(t=−2\.308,p=0\.050\)\\displaystyle\>0\\quad\(t=\-2\.308,\\ p=0\.050\)\(17\)LPBA\(confused,DHM\)\\displaystyle\\text\{LPBA\}\(\\textit\{confused\},\\,D\_\{HM\}\)<0\(t=3\.207,p=0\.033\)\\displaystyle<0\\quad\(t=3\.207,\\ p=0\.033\)\(18\)LPBA\(confused,DHF\)\\displaystyle\\text\{LPBA\}\(\\textit\{confused\},\\,D\_\{HF\}\)<0\(t=2\.809,p=0\.048\)\\displaystyle<0\\quad\(t=2\.809,\\ p=0\.048\)\(19\)
For the evaluative framing termw=refusedw=\\textit\{refused\}:

LPBA\(refused,DHM\)\\displaystyle\\text\{LPBA\}\(\\textit\{refused\},\\,D\_\{HM\}\)<0\(t=2\.837,p=0\.012\)\\displaystyle<0\\quad\(t=2\.837,\\ p=0\.012\)\(20\)LPBA\(refused,DBF\)\\displaystyle\\text\{LPBA\}\(\\textit\{refused\},\\,D\_\{BF\}\)<0\(t=2\.566,p=0\.021\)\\displaystyle<0\\quad\(t=2\.566,\\ p=0\.021\)\(21\)LPBA\(refused,DWF\)\\displaystyle\\text\{LPBA\}\(\\textit\{refused\},\\,D\_\{WF\}\)<0\(t=2\.213,p=0\.043\)\\displaystyle<0\\quad\(t=2\.213,\\ p=0\.043\)\(22\)
These results indicate systematic suppression of evaluative language for patients across identical sentence contexts\.

### 6\.3MLM Findings: Agency Attribution

Table[3](https://arxiv.org/html/2606.14460#S8.T3)reports all statistically significant MLM findings across𝒜resist\\mathcal\{A\}\_\{\\text\{resist\}\},𝒜cooperate\\mathcal\{A\}\_\{\\text\{cooperate\}\}, and𝒜passive\\mathcal\{A\}\_\{\\text\{passive\}\}subcategories\. The most statistically robust findings concern active cooperation language\. Forw=requestedw=\\textit\{requested\}andw=agreedw=\\textit\{agreed\}:

MLM\(requested,DBM\)\\displaystyle\\text\{MLM\}\(\\textit\{requested\},\\,D\_\{BM\}\)≪MLM\(requested,D0\)\(t=5\.906,p=0\.0001\)\\displaystyle\\ll\\text\{MLM\}\(\\textit\{requested\},\\,D\_\{0\}\)\\quad\(t=5\.906,\\ p=0\.0001\)\(23\)MLM\(agreed,DBM\)\\displaystyle\\text\{MLM\}\(\\textit\{agreed\},\\,D\_\{BM\}\)≪MLM\(agreed,D0\)\(t=3\.555,p=0\.002\)\\displaystyle\\ll\\text\{MLM\}\(\\textit\{agreed\},\\,D\_\{0\}\)\\quad\(t=3\.555,\\ p=0\.002\)\(24\)MLM\(requested,DBF\)\\displaystyle\\text\{MLM\}\(\\textit\{requested\},\\,D\_\{BF\}\)≪MLM\(requested,D0\)\(t=4\.419,p=0\.0008\)\\displaystyle\\ll\\text\{MLM\}\(\\textit\{requested\},\\,D\_\{0\}\)\\quad\(t=4\.419,\\ p=0\.0008\)\(25\)MLM\(agreed,DBF\)\\displaystyle\\text\{MLM\}\(\\textit\{agreed\},\\,D\_\{BF\}\)≪MLM\(agreed,D0\)\(t=4\.529,p=0\.0003\)\\displaystyle\\ll\\text\{MLM\}\(\\textit\{agreed\},\\,D\_\{0\}\)\\quad\(t=4\.529,\\ p=0\.0003\)\(26\)MLM\(requested,DAF\)\\displaystyle\\text\{MLM\}\(\\textit\{requested\},\\,D\_\{AF\}\)≫MLM\(requested,D0\)\(t=−2\.343,p=0\.037\)\\displaystyle\\gg\\text\{MLM\}\(\\textit\{requested\},\\,D\_\{0\}\)\\quad\(t=\-2\.343,\\ p=0\.037\)\(27\)
For the active resistance termw=declined∈𝒜resistw=\\textit\{declined\}\\in\\mathcal\{A\}\_\{\\text\{resist\}\}:

MLM\(declined,DBM\)\\displaystyle\\text\{MLM\}\(\\textit\{declined\},\\,D\_\{BM\}\)<MLM\(declined,D0\)\(t=2\.223,p=0\.043\)\\displaystyle<\\text\{MLM\}\(\\textit\{declined\},\\,D\_\{0\}\)\\quad\(t=2\.223,\\ p=0\.043\)\(28\)MLM\(declined,DBF\)\\displaystyle\\text\{MLM\}\(\\textit\{declined\},\\,D\_\{BF\}\)<MLM\(declined,D0\)\(t=2\.286,p=0\.038\)\\displaystyle<\\text\{MLM\}\(\\textit\{declined\},\\,D\_\{0\}\)\\quad\(t=2\.286,\\ p=0\.038\)\(29\)
Combined with the active cooperation suppression findings, this produces a coherent pattern in which Black patients are assigned lower probability for both active cooperation and active resistance language in identical clinical contexts — a systematic suppression of active agency across the full spectrum of𝒜resist∪𝒜cooperate\\mathcal\{A\}\_\{\\text\{resist\}\}\\cup\\mathcal\{A\}\_\{\\text\{cooperate\}\}\.

For the passive cooperation termw=presented∈𝒜passivew=\\textit\{presented\}\\in\\mathcal\{A\}\_\{\\text\{passive\}\}:

MLM\(presented,DBF\)\>MLM\(presented,D0\)\(t=−2\.095,p=0\.050\)\\text\{MLM\}\(\\textit\{presented\},\\,D\_\{BF\}\)\>\\text\{MLM\}\(\\textit\{presented\},\\,D\_\{0\}\)\\quad\(t=\-2\.095,\\ p=0\.050\)\(30\)
completing a representational profile in which Black Female patients are simultaneously less likely to be encoded as active agents — whether cooperating or resisting — and more likely to be encoded as passive recipients of clinical action\.

### 6\.4Corpus Frequency Analysis: Statistical Disparity vs\. Bias Amplification

Two striking corpus patterns directly contradict the model probability findings reported in Sections[6\.2](https://arxiv.org/html/2606.14460#S6.SS2)and[6\.3](https://arxiv.org/html/2606.14460#S6.SS3)\. First, for active agency language in Black patient notes:

fC\(refused,DBlack\)\\displaystyle f\_\{C\}\(\\textit\{refused\},\\,D\_\{\\text\{Black\}\}\)=15\.38≫fC\(refused,DWhite\)=7\.75per 10,000 tokens\\displaystyle=15\.38\\gg f\_\{C\}\(\\textit\{refused\},\\,D\_\{\\text\{White\}\}\)=7\.75\\quad\\text\{per 10,000 tokens\}\(31\)fC\(requesting,DBlack\)\\displaystyle f\_\{C\}\(\\textit\{requesting\},\\,D\_\{\\text\{Black\}\}\)=8\.46≫fC\(requesting,DWhite\)=4\.49per 10,000 tokens\\displaystyle=8\.46\\gg f\_\{C\}\(\\textit\{requesting\},\\,D\_\{\\text\{White\}\}\)=4\.49\\quad\\text\{per 10,000 tokens\}\(32\)
Active agency language appears substantially more frequently in Black patient clinical notes than in White patient notes, yet the model systematically suppresses active agency predictions for Black patients, constituting a direct contradiction:

sign⁡\(ΔC\)≠sign⁡\(ΔS\)\\operatorname\{sign\}\(\\Delta\_\{C\}\)\\neq\\operatorname\{sign\}\(\\Delta\_\{S\}\)\(33\)
Second, for the behavioral termagitated:

fC\(agitated,DWhite\)\\displaystyle f\_\{C\}\(\\textit\{agitated\},\\,D\_\{\\text\{White\}\}\)=4\.38≈fC\(agitated,DBlack\)=4\.62per 10,000 tokens\\displaystyle=4\.38\\approx f\_\{C\}\(\\textit\{agitated\},\\,D\_\{\\text\{Black\}\}\)=4\.62\\quad\\text\{per 10,000 tokens\}\(34\)fC\(agitated,DHispanic\)\\displaystyle f\_\{C\}\(\\textit\{agitated\},\\,D\_\{\\text\{Hispanic\}\}\)=4\.17,fC\(agitated,DAsian\)=0\.00per 10,000 tokens\\displaystyle=4\.17,\\quad f\_\{C\}\(\\textit\{agitated\},\\,D\_\{\\text\{Asian\}\}\)=0\.00\\quad\\text\{per 10,000 tokens\}\(35\)
Applying the classification scheme defined in Section[5\.3](https://arxiv.org/html/2606.14460#S5.SS3):

Contradiction rate \(overall\)=2132=65\.6%\\displaystyle=\\tfrac\{21\}\{32\}=65\.6\\%\(36\)Contradiction rate \(Black patients\)=1215=80\.0%\\displaystyle=\\tfrac\{12\}\{15\}=80\.0\\%\(37\)Contradiction rate \(MLM agency\)=78=87\.5%\\displaystyle=\\tfrac\{7\}\{8\}=87\.5\\%\(38\)

## 7DISCUSSION

### 7\.1Bias Amplification vs\. Bias Inheritance

The 65\.6% overall contradiction rate, rising to 80% for Black patients and 87\.5% for agency attribution under MLM probing, demonstrates that representational bias in ClinicalBERT is predominantly model\-generated rather than data\-inherited\. This finding directly operationalizes the distinction between statistical disparity and bias amplification established in Section[4\.2](https://arxiv.org/html/2606.14460#S4.SS2), providing the first direct empirical evidence in the clinical NLP domain that a widely deployed clinical language model amplifies demographic associations beyond what its training corpus warrants, constituting a non\-trivialΔA\(w,D\)≠0\\Delta\_\{A\}\(w,D\)\\neq 0across the majority of significant findings\.

Hall et al\. \([2022](https://arxiv.org/html/2606.14460#bib.bib12)\)note that the conditions under which bias amplification arises in machine learning models remain poorly understood\. The present findings contribute directly to this gap by demonstrating that in the clinical domain, bias amplification is not an incidental feature of ClinicalBERT’s representations but a structural property, most severe precisely for the demographic groups facing the greatest existing healthcare disparities\.Hofmann et al\. \([2024](https://arxiv.org/html/2606.14460#bib.bib13)\)corroborate this interpretation, demonstrating that large language models produce covertly racially discriminatory outputs even after alignment procedures specifically designed to reduce bias\.

### 7\.2Linguistic Mechanisms of Representational Harm

The LPBA and MLM findings collectively demonstrate that representational bias in ClinicalBERT operates through three analytically distinct linguistic mechanisms — behavioral characterization, evaluative framing, and agency attribution — each producing demographically specific patterns that vary by racial group and gender in ways that race\-level analysis alone cannot detect\.

Across behavioral language, the suppression ofagitatedfor Black patients while amplifying it for Hispanic and Asian Male patients constitutes demeaning harm\. The suppression ofconfusedfor Hispanic patients of both genders constitutes erasure harm, systematically underrepresenting a clinically significant cognitive state for this group\(Crawford,[2017](https://arxiv.org/html/2606.14460#bib.bib8)\)\.

Across evaluative framing, the suppression ofrefusedacross multiple demographic groups encodes a judgment about their relationship to medical authority that, asGoddu et al\. \([2018](https://arxiv.org/html/2606.14460#bib.bib11)\)demonstrated, directly influences physician attitudes and treatment decisions when reproduced in clinical documentation\.

Across agency attribution, the systematic suppression of active agency for Black patients across both cooperation and resistance terms, both gender groups, and both LPBA and MLM methods constitutes the most coherent and clinically consequential representational profile in the study\. The passive cooperation profile of Black Female patients, visible only through intersectional analysis, extendsCrenshaw \([1989](https://arxiv.org/html/2606.14460#bib.bib9)\)’s argument that the compound effects of race and gender produce distinct representational outcomes that race\-level analysis systematically obscures\.

### 7\.3Implications for Bias Mitigation and Clinical AI Governance

The finding that 65\.6% of significant model outputs contradict corpus frequency patterns carries a direct implication for bias mitigation practice: rebalancing training data to ensure more equitable demographic representation is unlikely to be sufficient, because the biases identified here are predominantly model\-generated rather than data\-inherited\. AsHofmann et al\. \([2024](https://arxiv.org/html/2606.14460#bib.bib13)\)demonstrate, post\-training alignment procedures are similarly insufficient when bias is structurally encoded in model representations\.

Effective mitigation requires three coordinated interventions\. First, ongoing auditing of model outputs across intersectional demographic combinations throughout deployment\. Second, transparent reporting of demographic probability disparities to clinical practitioners\. Third, governance frameworks capable of holding deployed systems accountable for representational harms, specifying behavioral characterization, evaluative framing, and agency attribution as concrete auditing targets\.

### 7\.4Limitations and Future Work

Three methodological limitations constrain the generalizability of the present findings\. First, the analysis is conducted on a single masked language model, ClinicalBERT, pretrained on MIMIC\-III from a single academic medical center, and findings cannot be assumed to generalize to other clinical language models, corpora, or architectures\. Second, the study does not address post\-training procedures\. Third, the template construction method inserts explicit demographic descriptors that may be syntactically rare relative to MIMIC\-III training data\.

Two analytical limitations motivate future expansion\. The target word setsβ\\beta,ℰ\\mathcal\{E\}, andα\\alpharepresent a theoretically grounded but necessarily partial sample\. Additionally, the corpus frequency analysis relies on global unigram frequencies rather than context\-conditional distributions\. Future research should expand target word sets, incorporate context\-conditional baselines, and develop probing frameworks appropriate for autoregressive clinical language models\.

## 8CONCLUSION

This paper presents a computational audit demonstrating that representational bias in ClinicalBERT is predominantly model\-generated rather than data\-inherited, with 65\.6% of significant findings contradicting MIMIC\-III corpus distributions, rising to 80% for Black patients and 87\.5% for agency attribution under MLM probing\. Across behavioral characterization, evaluative framing, and agency attribution, ClinicalBERT encodes demographically specific associations that amplify and in critical cases invert its training corpus distributions\. The probing framework introduced here provides a replicable auditing methodology for clinical AI governance, and future work should extend it to autoregressive clinical language models, broader target word sets, and multi\-institutional corpora\.

Table 1:Distribution of significant findings across demographic groups and semantic categories\.Table 2:Complete LPBA significant findings for behavioral language \(β\\beta\) and evaluative framing \(ℰ\\mathcal\{E\}\)\.Table 3:Complete MLM significant findings for agency attribution \(α\\alpha\)\.Table 4:Corpus frequency rates per 10,000 tokens for all target words across demographic groups\.
## ACKNOWLEDGMENTS

The author gratefully acknowledges the supervision of Dr\. Micha Elsner and Prof\. James Phelan, whose guidance was invaluable to the development of this work\.

## References

- Alsentzer et al\. \(2019\)Alsentzer, E\., Murphy, J\., Boag, W\., Weng, W\., Jindi, D\., Naumann, T\., and McDermott, M\. \(2019\)\.Publicly available clinical BERT embeddings\.In*Proceedings of the 2nd Clinical Natural Language Processing Workshop*, pages 72–78, Minneapolis, Minnesota\. Association for Computational Linguistics\.
- Birhane \(2022\)Birhane, A\. \(2022\)\.Automating ambiguity: Challenges and pitfalls of artificial intelligence\.*arXiv preprint arXiv:2206\.04179*\.
- Blodgett et al\. \(2020\)Blodgett, S\. L\., Barocas, S\., Daumé III, H\., and Wallach, H\. \(2020\)\.Language \(technology\) is power: A critical survey of “bias” in NLP\.In*Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics*, pages 5454–5476\.
- Bolukbasi et al\. \(2016\)Bolukbasi, T\., Chang, K\. W\., Zou, J\. Y\., Saligrama, V\., and Kalai, A\. T\. \(2016\)\.Man is to computer programmer as woman is to homemaker? Debiasing word embeddings\.*Advances in Neural Information Processing Systems*, 29\.
- Buolamwini and Gebru \(2018\)Buolamwini, J\. and Gebru, T\. \(2018\)\.Gender shades: Intersectional accuracy disparities in commercial gender classification\.In*Proceedings of the 1st Conference on Fairness, Accountability and Transparency*, pages 77–91\.
- Chien and Danks \(2024\)Chien, J\. and Danks, D\. \(2024\)\.Beyond behaviorist representational harms: A plan for measurement and mitigation\.In*Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency*, pages 933–946\.
- Corvi et al\. \(2025\)Corvi, E\., Washington, H\., Reed, S\., Atalla, C\., Chouldechova, A\., Dow, P\., Garcia\-Gathright, J\., Pangakis, N\., Sheng, E\., Vann, D\., Vogel, M\., and Wallach, H\. \(2025\)\.Taxonomizing representation harms using speech act theory\.In*Findings of the Association for Computational Linguistics: ACL 2025*, pages 3907–3932, Vienna, Austria\.
- Crawford \(2017\)Crawford, K\. \(2017\)\.The trouble with bias\.Keynote address, Neural Information Processing Systems Conference \(NeurIPS\), Long Beach, CA\.
- Crenshaw \(1989\)Crenshaw, K\. \(1989\)\.Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics\.*University of Chicago Legal Forum*, 1989\(1\):139–167\.
- Devlin et al\. \(2019\)Devlin, J\., Chang, M\. W\., Lee, K\., and Toutanova, K\. \(2019\)\.BERT: Pre\-training of deep bidirectional transformers for language understanding\.In*Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies*, volume 1, pages 4171–4186\.
- Goddu et al\. \(2018\)Goddu, A\. P\., O’Conor, K\. J\., Lanzkron, S\., Saheed, M\. O\., Peak, S\., Kelen, G\. D\., Doyle, P\. A\., and Beach, M\. C\. \(2018\)\.How language shapes prejudice against patients: An experimental study with simulated medical records\.*Journal of General Internal Medicine*, 33\(5\):685–691\.
- Hall et al\. \(2022\)Hall, P\., Gill, N\., and Schmidt, N\. \(2022\)\.A systematic study of bias amplification in graph neural networks\.In*Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society*\.
- Hofmann et al\. \(2024\)Hofmann, V\., Kalluri, P\. R\., Jurafsky, D\., and King, S\. \(2024\)\.AI generates covertly racist decisions about people based on their dialect\.*Nature*, 633\(8028\):147–154\.
- Hoffman et al\. \(2016\)Hoffman, K\. M\., Trawalter, S\., Axt, J\. R\., and Oliver, M\. N\. \(2016\)\.Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between Blacks and Whites\.*Proceedings of the National Academy of Sciences*, 113\(16\):4296–4301\.
- Johnson et al\. \(2016\)Johnson, A\. E\., Pollard, T\. J\., Shen, L\., Lehman, L\. W\., Feng, M\., Ghassemi, M\., Moody, B\., Szolovits, P\., Celi, L\. A\., and Mark, R\. G\. \(2016\)\.MIMIC\-III, a freely accessible critical care database\.*Scientific Data*, 3:160035\.
- Kurita et al\. \(2019\)Kurita, K\., Vyas, N\., Pareek, A\., Black, A\. W\., and Tsvetkov, Y\. \(2019\)\.Measuring bias in contextualized word representations\.In*Proceedings of the First Workshop on Gender Bias in Natural Language Processing*, pages 166–172\.
- Mehrabi et al\. \(2021\)Mehrabi, N\., Morstatter, F\., Saxena, N\., Lerman, K\., and Galstyan, A\. \(2021\)\.A survey on bias and fairness in machine learning\.*ACM Computing Surveys*, 54\(6\):1–35\.
- Obermeyer et al\. \(2019\)Obermeyer, Z\., Powers, B\., Vogeli, C\., and Mullainathan, S\. \(2019\)\.Dissecting racial bias in an algorithm used to manage the health of populations\.*Science*, 366\(6464\):447–453\.
- Park et al\. \(2021\)Park, J\., Saha, S\., Chee, B\., Taylor, J\., and Beach, M\. C\. \(2021\)\.Physician use of stigmatizing language in patient medical records\.*JAMA Network Open*, 4\(7\):e2117052\.
- Zhao et al\. \(2019\)Zhao, J\., Wang, T\., Yatskar, M\., Cotterell, R\., Ordonez, V\., and Chang, K\. W\. \(2019\)\.Gender bias in contextualized word embeddings\.In*Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies*, volume 1, pages 629–634\.

## Appendix ATARGET WORD LIST AND SELECTION CRITERIA

Target words across all three semantic categories were selected through a two\-stage process\. First, candidate terms were identified through a systematic review of medical humanities literature documenting demographic\-associated language patterns in clinical documentation, specificallyGoddu et al\. \([2018](https://arxiv.org/html/2606.14460#bib.bib11)\),Park et al\. \([2021](https://arxiv.org/html/2606.14460#bib.bib19)\), andHoffman et al\. \([2016](https://arxiv.org/html/2606.14460#bib.bib14)\)\. Second, candidate terms were validated against the MIMIC\-III corpus to confirm sufficient frequency for statistical analysis; a minimum of five instances per target word across demographic group comparisons was required for inclusion\.

### A\.1 Semantic Categories

We organize target words into three theoretically grounded semantic categories:

β\\displaystyle\\beta=\{difficult, resistant, agitated, confused, oriented, appropriate, inappropriate\}\\displaystyle=\\\{\\text\{difficult, resistant, agitated, confused, oriented, appropriate, inappropriate\}\\\}\(39\)ℰ\\displaystyle\\mathcal\{E\}=\{cooperative, compliance, refused, refusing\}\\displaystyle=\\\{\\text\{cooperative, compliance, refused, refusing\}\\\}\(40\)α\\displaystyle\\alpha=\{refused, declined, requested, agreed, responded, presented\}\\displaystyle=\\\{\\text\{refused, declined, requested, agreed, responded, presented\}\\\}\(41\)

### A\.2 Behavioral Language \(β\\beta\)

Behavioral language terms characterize patients through inferred patterns of conduct, constructing the patient as a particular type of person whose behavior requires evaluation and management\. Each term was selected on the basis of documented disproportionate application to minority patient groups in clinical auditing research:

- •Difficult:Park et al\. \([2021](https://arxiv.org/html/2606.14460#bib.bib19)\)identified “difficult” as one of the primary linguistic channels through which physicians encode subjective evaluative judgments in clinical documentation, with significant racial and socioeconomic patterning in its application\.Goddu et al\. \([2018](https://arxiv.org/html/2606.14460#bib.bib11)\)demonstrated that its presence in clinical charts directly influences subsequent provider attitudes and treatment decisions\.
- •Resistant: Selected on the basis of its functional relationship to compliance\-related language, encoding patient non\-cooperation as a behavioral trait rather than a situational response\. Medical humanities research has documented its disproportionate application to Black and Hispanic patients in psychiatric and primary care settings\.
- •Agitated: Selected because it encodes an affective behavioral state that carries clinical consequences for medication decisions, restraint use, and discharge planning\. Its near\-equal corpus frequency across White and Black patient notes — 4\.38 versus 4\.62 per 10,000 tokens — made it a particularly critical test case for bias amplification, as the training data does not warrant the significant model probability differences the LPBA analysis identified\.
- •Confused: Selected as a behavioral descriptor encoding cognitive state with direct implications for clinical decision\-making around capacity, consent, and treatment\. Its inclusion enables examination of whether the model encodes demographic assumptions about cognitive clarity in ways that parallel the stereotypes documented in medical education literature\.
- •Oriented: Selected as the cognitive complement to “confused,” enabling directional analysis of whether demographic descriptors shift the model’s encoding of cognitive clarity in opposing directions across racial groups\.
- •Appropriate: Selected because its use as a behavioral descriptor in clinical documentation encodes a normative judgment about patient conduct relative to clinical expectations, with documented variation across demographic groups in psychiatric and emergency medicine settings\.
- •Inappropriate: Selected as the behavioral complement to “appropriate,” enabling analysis of whether the model encodes asymmetric normative judgments about patient conduct across demographic groups\.

### A\.3 Evaluative Framing \(ℰ\\mathcal\{E\}\)

Evaluative framing terms position patients explicitly in relation to medical authority, encoding judgments about cooperation with or resistance to clinical instructions\. Unlike behavioral language, evaluative framing terms encode a power dynamic; the patient’s relationship to institutional authority becomes the primary object of clinical judgment:

- •Cooperative: Selected as the prototypical positive evaluative framing term, encoding patient alignment with clinical authority\. Its inclusion enables examination of whether demographic descriptors suppress positive institutional evaluations for specific patient groups in ways that parallel the compliance literature\.
- •Compliance: Selected as the most frequently studied evaluative framing term in medical humanities research\. Documented disproportionate application to Black patients and women makes it a critical test case for representational harm in evaluative framing\. Its corpus frequency difference across racial groups — 8\.21 for White patients versus 6\.43 for Black patients per 10,000 tokens — provides a meaningful baseline for comparing model probability assignments\.
- •Refused: Selected because its use as an evaluative framing term encodes active resistance to medical authority, with documented racial patterning in its application\. The corpus finding that “refused” appears in 15\.38 per 10,000 tokens of Black patient notes compared to 7\.75 for White patient notes — nearly double the rate — makes it the most analytically significant term for examining the contradiction between corpus distribution and model output\.
- •Refusing: Selected as the present\-participle complement to “refused,” enabling examination of whether the model encodes resistance as an ongoing behavioral state differently across demographic groups than as a discrete past action\.

### A\.4 Agency Attribution \(α\\alpha\)

Agency attribution terms encode causal responsibility by assigning patients either active or passive agency in relation to their care\. The three\-way subcategorization into active resistance \(𝒜resist\\mathcal\{A\}\_\{\\text\{resist\}\}\), active cooperation \(𝒜cooperate\\mathcal\{A\}\_\{\\text\{cooperate\}\}\), and passive cooperation \(𝒜passive\\mathcal\{A\}\_\{\\text\{passive\}\}\) reflects the full spectrum of patient agency construction in clinical language:

- •refused\(𝒜resist\\mathcal\{A\}\_\{\\text\{resist\}\}\): Encodes active resistance to clinical instructions as a completed past action, assigning full causal agency to the patient\. Its inclusion in bothℰ\\mathcal\{E\}andα\\alphareflects its dual function as both an evaluative framing term and an agency attribution term, a semantic overlap that is analytically significant for understanding how the two mechanisms interact\.
- •declined\(𝒜resist\\mathcal\{A\}\_\{\\text\{resist\}\}\): Selected as a semantically related but pragmatically distinct alternative to “refused,” “declined” carries a more neutral valence while still encoding active patient agency in resisting clinical recommendations\. Its inclusion enables examination of whether the model’s agency suppression pattern for Black patients extends across both neutral and negatively valenced active resistance terms\.
- •requested\(𝒜cooperate\\mathcal\{A\}\_\{\\text\{cooperate\}\}\): Encodes active patient agency in initiating clinical interactions, constructing the patient as a causal agent making decisions about their own care\. Its selection is grounded in the NIH framework for patient agency in clinical documentation, which identifies initiating requests as a primary marker of active clinical decision\-making\.
- •agreed\(𝒜cooperate\\mathcal\{A\}\_\{\\text\{cooperate\}\}\): Encodes active patient cooperation as a completed past decision, assigning causal agency to the patient for accepting clinical recommendations\. Its inclusion alongside “requested” enables examination of whether the model’s agency suppression pattern applies to both forms of active cooperation\.
- •responded\(𝒜passive\\mathcal\{A\}\_\{\\text\{passive\}\}\): Encodes patient reaction to clinical stimuli without implying active decision\-making, constructing the patient as a responsive rather than initiating agent\. Its inclusion enables examination of whether demographic descriptors shift the model’s encoding of passive clinical participation\.
- •presented\(𝒜passive\\mathcal\{A\}\_\{\\text\{passive\}\}\): Selected as the most common passive agency attribution term in clinical discharge summaries, encoding the patient as an object of clinical observation rather than a decision\-making subject\. Its high corpus frequency — 18\.54 per 10,000 tokens for White patients — makes it a statistically robust test case for passive agency attribution across demographic groups\.

## Appendix BFULL STATISTICAL RESULTS

Tables[5](https://arxiv.org/html/2606.14460#A2.T5)and[6](https://arxiv.org/html/2606.14460#A2.T6)report complete pairedtt\-test results for all LPBA and MLM target words and demographic combinations, including non\-significant findings\. Findings reported as significant in the main paper are marked with an asterisk\.

Table 5:Complete LPBA significant findings for behavioral language and evaluative framing\.Table 6:Complete MLM significant findings for agency attribution\.
## Appendix CCLINICAL SENTENCE TEMPLATE EXAMPLES

### C\.1 Template Construction Method

Demographic descriptors were inserted using two systematic methods\. In Method 1, the word “patient” was replaced directly with the demographic descriptor\. In Method 2, a demographic prefix was prepended to sentences that did not contain an explicit patient reference\. Both methods preserve the original clinical language while systematically varying only the demographic descriptor across all eight race\-gender combinations\.

### C\.2 Representative LPBA Templates — Behavioral Language \(β\\beta\)

Templateβ\\beta\-01:“The \[DEMOGRAPHIC\] patient became \[MASK\] when attempts were made to reposition them\.”

Templateβ\\beta\-02:“Staff noted that the \[DEMOGRAPHIC\] patient appeared \[MASK\] throughout the morning shift\.”

Templateβ\\beta\-03:“The \[DEMOGRAPHIC\] patient was \[MASK\] to verbal stimuli upon assessment\.”

Templateβ\\beta\-04:“The \[DEMOGRAPHIC\] patient remained \[MASK\] during the physical examination\.”

Templateβ\\beta\-05:“Nursing documented that the \[DEMOGRAPHIC\] patient became increasingly \[MASK\] following the procedure\.”

### C\.3 Representative Agency Attribution Templates \(α\\alpha\)

Table 7:Template A\-03 — Active Cooperation \(requested\)\.
### C\.4 Demographic Descriptor Substitution Table

Table 8:Demographic descriptor index and reference group assignment\.

## Appendix DCORPUS FREQUENCY DATA

### D\.1 Corpus Frequency Rates

See Table[4](https://arxiv.org/html/2606.14460#S8.T4)in the main text\.

### D\.2 Full RQ3 Comparison — LPBA Findings

Table 9:Complete RQ3 comparison between model probability differences and corpus frequency differences for all LPBA findings\.WordGroupModel diff\.Model dir\.Corpus diff\.Corpus dir\.AlignmentcooperativeBlack\+0\.049higher\-1\.530lowerContradictcooperativeHispanic\-0\.051lower\+7\.893higherContradictcooperativeAsian\+0\.047higher\+0\.155higherReflectcomplianceBlack\-0\.447lower\+2\.524higherContradictcomplianceHispanic\+0\.026higher\-1\.002lowerContradictcomplianceAsian\+0\.297higher\+4\.355higherReflectrefusedBlack\-0\.123lower\+7\.632higherContradictrefusedHispanic\-0\.115lower\+4\.747higherContradictrefusedAsian\-0\.063lower\-7\.753lowerReflectrefusingBlack\+0\.081higher\+4\.468higherReflectrefusingHispanic\-0\.011lower\-1\.685lowerReflectrefusingAsian\+0\.043higher\-1\.685lowerContradictLPBA contradiction rate: 7/12 = 58\.3%
### D\.3 Full RQ3 Comparison — MLM Findings

Table 10:Complete RQ3 comparison between model probability differences and corpus frequency differences for all MLM findings\.WordGroupModel diff\.Model dir\.Corpus diff\.Corpus dir\.AlignmentrequestedBlack Male\-0\.247lower\+3\.967higherContradictagreedBlack Female\-0\.274lower\+0\.216higherContradictrequestedBlack Female\-0\.177lower\+3\.967higherContradictagreedBlack Male\-0\.217lower\+0\.216higherContradictrequestedAsian Female\+0\.218higher\-4\.494lowerContradictdeclinedBlack Female\-0\.297lower\+2\.506higherContradictdeclinedBlack Male\-0\.122lower\+2\.506higherContradictpresentedBlack Female\+0\.053higher\+8\.859higherReflectMLM contradiction rate: 7/8 = 87\.5%

## Appendix EINTERSECTIONAL RESULTS

### E\.1 Intersectional Contradiction Rates

Table 11:Contradiction rates by demographic group\.
### E\.2 Key Intersectional Patterns

Four intersectional patterns emerge from the full results that extend the race\-level analysis in the main paper\.

Thepresentedfinding for Black Female patients \(t=−2\.095t=\-2\.095,p=0\.050p=0\.050\) is specific to Black Female patients and does not appear for Black Male patients \(t=−0\.979t=\-0\.979,p=0\.340p=0\.340\), making it entirely invisible to race\-level analysis\. Combined with the active cooperation and active resistance suppression patterns shared by both Black Male and Black Female patients, this creates a representational profile unique to Black Female patients: simultaneously less likely to be encoded as active agents in any capacity and more likely to be encoded as passive recipients of clinical action\. This is the clearest demonstration in the study that intersectional analysis is methodologically necessary rather than optional\.

Also, theagitatedpattern shows opposing directions within minority groups that race\-level analysis would obscure\. Black patients of both genders show a lower probability foragitatedwhile Hispanic Male and Asian Male patients show a higher probability\. This divergence — suppression for Black patients and amplification for Hispanic and Asian Male patients — constitutes a structured set of group\-specific associations that cannot be captured by any aggregate metric and reveals that ClinicalBERT encodes not a simple minority\-versus\-majority distinction but a complex, demographically specific mapping of behavioral language\.

Therefusedsuppression pattern for White Female patients \(t=2\.213t=2\.213,p=0\.043p=0\.043\) introduces a gender dimension operating independently of race\. White Female patients are assigned a significantly lower probability forrefusedlanguage than White Male patients in identical clinical sentence contexts, suggesting that the suppression of active resistance language is not exclusively a racial phenomenon but also operates along gender lines within the White patient group\. This finding extends the scope of evaluative framing bias beyond racial minority groups and has implications for how clinical AI governance frameworks define the scope of demographic auditing\.

Finally, the White Female borderline finding for therequestedtest \(t=−2\.154t=\-2\.154,p=0\.052p=0\.052\) falling just above the significance threshold warrants attention in future research\. The trend toward amplification of active cooperation language for White Female patients contrasts with the active cooperation suppression for Black patients\. This suggests that the model may encode opposing agency attributions along both racial and gender lines simultaneously\. A larger template set may be sufficient to detect this pattern at statistical significance\.
A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions

Similar Articles

Effect of Demographic Bias on Skin Lesion Classification

Learning Perspectivist Social Meaning via Demographic-Conditioned Fusion Embeddings

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

The Critical Role of Model Selection in Causal Inference: A Comparative Analysis of Classification Models within the InferBERT Framework for Pharmacovigilance

Surrogate modeling for interpreting black-box LLMs in medical predictions

Submit Feedback

Similar Articles

Effect of Demographic Bias on Skin Lesion Classification
Learning Perspectivist Social Meaning via Demographic-Conditioned Fusion Embeddings
A Causal Language Modeling Detour Improves Encoder Continued Pretraining
The Critical Role of Model Selection in Causal Inference: A Comparative Analysis of Classification Models within the InferBERT Framework for Pharmacovigilance
Surrogate modeling for interpreting black-box LLMs in medical predictions