Do Emotions Influence Moral Judgment in Large Language Models?

arXiv cs.CL Papers

Summary

University of Cincinnati researchers show that adding positive or negative emotions to prompts can flip LLMs’ moral acceptability judgments in ~20% of cases, revealing an emotion-driven alignment gap with humans.

arXiv:2604.19125v1 Announce Type: new Abstract: Large language models have been extensively studied for emotion recognition and moral reasoning as distinct capabilities, yet the extent to which emotions influence moral judgment remains underexplored. In this work, we develop an emotion-induction pipeline that infuses emotion into moral situations and evaluate shifts in moral acceptability across multiple datasets and LLMs. We observe a directional pattern: positive emotions increase moral acceptability and negative emotions decrease it, with effects strong enough to reverse binary moral judgments in up to 20% of cases, and with susceptibility scaling inversely with model capability. Our analysis further reveals that specific emotions can sometimes behave contrary to what their valence would predict (e.g., remorse paradoxically increases acceptability). A complementary human annotation study shows humans do not exhibit these systematic shifts, indicating an alignment gap in current LLMs.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/22/26, 08:30 AM

# Do Emotions Influence Moral Judgment in Large Language Models?
Source: [https://arxiv.org/html/2604.19125](https://arxiv.org/html/2604.19125)
###### Abstract

Large language models have been extensively studied for emotion recognition and moral reasoning as distinct capabilities, yet the extent to which emotions influence moral judgment remains underexplored\. In this work, we develop an emotion\-induction pipeline that infuses emotion into moral situations and evaluate shifts in moral acceptability across multiple datasets and LLMs\. We observe a directional pattern: positive emotions increase moral acceptability and negative emotions decrease it, with effects strong enough to reverse binary moral judgments in up to 20% of cases, and with susceptibility scaling inversely with model capability\. Our analysis further reveals that specific emotions can sometimes behave contrary to what their valence would predict \(e\.g\., remorse paradoxically increases acceptability\)\. A complementary human annotation study shows humans do not exhibit these systematic shifts, indicating an alignment gap in current LLMs\.

Do Emotions Influence Moral Judgment in Large Language Models?

Mohammad Saim and Tianyu JiangUniversity of Cincinnatisaimmd@mail\.uc\.edu, tianyu\.jiang@uc\.edu

## 1Introduction

The alignment of large language models \(LLMs\) with human moral values remains a central challenge in natural language processing\. Recent systems such as ChatGPT and Claude have demonstrated proficiency in adhering to explicit ethical guidelines\(Huanget al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib2); Nuneset al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib3)\)\. These systems enforce explicit ethical constraints, such as refusing to generate hate speech or provide instructions for constructing weapons\. However, moral judgment in real\-world settings rarely involves such clear\-cut prohibitions\. Instead, it emerges in contested situations where reasonable people disagree, and where context, relationships, and perspective shape what counts as right or wrong\(Yuet al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib4)\)\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x1.png)Figure 1:Adding a positive emotion \(pride\) or a negative emotion \(fear\) to the same moral situation moves the model’s acceptability rating in opposite directions on a 1–7 Likert scale\.A defining feature of moral judgment is that it is rarely formed from emotionally neutral conditions\. Research in psychology establishes that emotions influence how people interpret actions, assign blame, and judge permissibility\(Haidt,[2001](https://arxiv.org/html/2604.19125#bib.bib16); Greene,[2009](https://arxiv.org/html/2604.19125#bib.bib17)\)\. Moral emotions—including anger, disgust, and compassion—have been theorized as core mechanisms through which individuals navigate and enforce ethical norms\(Haidtet al\.,[2003](https://arxiv.org/html/2604.19125#bib.bib63)\)\. The same action might be judged differently when accompanied by different emotions, such as joy, fear, or guilt, even when the underlying facts remain unchanged\. Despite this, most NLP benchmarks and evaluations of moral reasoning in LLMs assume emotional neutrality, i\.e\., emotions are absent in the judgment process\(Forbeset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib5); Hendryckset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib15)\)\. Therefore, the influence of emotion on such judgments remains largely unexamined\.

In this work, we address this gap by studying how emotions influence moral acceptability judgments in LLMs\. We study the emotional states that the narrator expresses but are not directly tied to the ethical action itself\. This distinction is central to the affect\-as\-information theory, which holds that people often use emotional states as heuristic signals when making evaluative judgments\(Schwarz,[2012](https://arxiv.org/html/2604.19125#bib.bib32)\)\. To ground this study, we draw on two well\-established theories of human moral cognition\. Haidt’sSocial Intuitionist Model\(SIM\)\(Haidt,[2001](https://arxiv.org/html/2604.19125#bib.bib16)\)argues that moral judgment is driven primarily by quick, automatic, affect\-laden intuitions, with deliberative reasoning serving mainly as a post hoc justification\. Greene’sDual Process Theory\(Greene,[2009](https://arxiv.org/html/2604.19125#bib.bib17)\)similarly posits a neuro\-cognitive tension between an emotion\-driven and a deliberative system in moral decision\-making\. Crucially, the text on which LLMs are trained is itself a product of human authors operating under these same mechanisms, i\.e\., moral discourse in online communities, news, and social media reflects the affect\-laden judgments that are described in SIM and Dual Process Theory\(Ornsteinet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib19); Kawintiranon and Singh,[2022](https://arxiv.org/html/2604.19125#bib.bib18); Chalkidiset al\.,[2022](https://arxiv.org/html/2604.19125#bib.bib20)\)\. LLMs may therefore encode statistical associations between emotional cues and moral evaluations, not by reasoning about affect, but by absorbing the patterns in the training data\.

We test whether this application of affect\-laden associations systematically shifts LLM moral judgments through a controlled emotion\-induction framework\. For each moral situation, we generate two modified versions: one embedding a positive emotional state and one embedding a negative one, while keeping the underlying action unchanged\. Figure[1](https://arxiv.org/html/2604.19125#S1.F1)illustrates this setup\. We evaluate this framework on two complementary datasets: Social\-Chem\-101\(Forbeset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib5)\), covering everyday moral situations, and theJusticesubset of ETHICS\(Hendryckset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib15)\), which targets claims of deservingness\.

Across multiple LLMs on Social\-Chem\-101, positive emotions increase moral acceptability ratings by up to1\.211\.21points on a 7\-point Likert scale, while negative emotions decrease ratings by up to1\.151\.15points\. On the ETHICS Justice subset, this effect is strong enough to reverse the moral ordering between reasonable and unreasonable claims in up to 20% of cases\. Across both datasets, smaller models shift in Likert rating more than larger ones\. We further identify individual emotions that run counter to their valence \(e\.g\., remorse paradoxically increases acceptability\), and a complementary human\-annotation study shows that humans do not exhibit these systematic shifts, indicating an alignment gap in current LLMs\. We publicly release the code and modified scenarios\.111[https://github\.com/cincynlp/EmoMoral](https://github.com/cincynlp/EmoMoral)As an overview, this paper makes the following contributions:

1. 1\.We introduce the first controlled emotion\-induction framework for studying how emotion shifts LLM moral judgments, evaluating seven models on two complementary datasets\.
2. 2\.We show that positive emotions raise LLM moral acceptability, while negative ones lower it, with the effect strong enough to reverse up to 20% of binary moral judgments and with susceptibility scaling inversely with model capability\.
3. 3\.We also demonstrate two nuances beyond this valence\-based effect: \(i\) specific emotions go against their valence \(remorse increases acceptability and relief decreases it\), and \(ii\) human annotators do not exhibit the systematic shifts observed in LLMs, indicating an alignment gap in current LLMs\.

## 2Related Works

#### Moral and Normative Datasets\.

Prior NLP benchmarks have focused on moral reasoning, but rarely consider the role of emotional context\.Forbeset al\.\([2020](https://arxiv.org/html/2604.19125#bib.bib5)\)introducedSocial\-Chem\-101, a corpus of 292k “rule\-of\-thumbs” that capture social and moral norms in everyday situations\.Hendryckset al\.\([2020](https://arxiv.org/html/2604.19125#bib.bib15)\)created the ETHICS benchmark, spanning justice, well\-being, duties, virtues, and commonsense morality, and found that existing language models have only a partial ability to predict human ethical judgments\.Talatet al\.\([2022](https://arxiv.org/html/2604.19125#bib.bib55)\)further demonstrated that models trained on such benchmarks risk encoding the normative biases of their annotators\.Jinet al\.\([2022](https://arxiv.org/html/2604.19125#bib.bib45)\)proposed MoralExceptQA, a challenging set for benchmarking LLMs on moral flexibility questions, along with deploying their own MoralCoT prompting strategy to detail multi\-step and multi\-aspect moral reasoning for LLMs\.Sachdeva and van Nuenen \([2025](https://arxiv.org/html/2604.19125#bib.bib44)\)evaluate LLMs on everyday moral dilemmas drawn fromr/AITA, finding that models overlook emotional cues that human raters rely on to reach verdicts\. In contrast, our annotation study reveals the opposite asymmetry under explicit emotion induction: LLMs over\-respond to affective framing, whereas humans do not\. More recently,Kumar and Jurgens \([2025](https://arxiv.org/html/2604.19125#bib.bib66)\)introduced UNIMORAL, a multilingual dataset integrating psychologically grounded moral dilemmas across six languages, highlighting that moral reasoning in LLMs remains sensitive to cultural and linguistic context\. Among research in moral dilemmas, a widely used framework for analyzing human morality is the Moral Foundations Theory \(MFT\)\(Grahamet al\.,[2013](https://arxiv.org/html/2604.19125#bib.bib26)\)\.Abdulhaiet al\.\([2024](https://arxiv.org/html/2604.19125#bib.bib27)\)applied MFT to probe moral biases in LLMs across five moral foundations\. Although the psychological basis of MFT centers on emotions, that work frames the foundations cognitively and does not test how emotional prompts activate different foundations\. More broadly, computational approaches to moral reasoning have drawn on commonsense norm banks\(Jianget al\.,[2021](https://arxiv.org/html/2604.19125#bib.bib29); Lourieet al\.,[2021](https://arxiv.org/html/2604.19125#bib.bib21)\), utilitarian and deontological reasoning\(Keshmirianet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib13)\), and dialogue\-grounded ethical judgments\(Ziemset al\.,[2022](https://arxiv.org/html/2604.19125#bib.bib22)\)\.

In addition to MFT, LLMs have been evaluated on utilitarian\(Keshmirianet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib13)\)and deontological\(Jinet al\.,[2022](https://arxiv.org/html/2604.19125#bib.bib45)\)dimensions of moral reasoning\.Valdesolo and DeSteno \([2006](https://arxiv.org/html/2604.19125#bib.bib48)\)indicate that inducing positive affect reduces deontological rigidity in humans, yet whether analogous affective modulation operates in LLMs remains unexamined\. However, across these frameworks, emotion is treated as background context at best, rather than an active variable that modulates moral judgment\. Our work departs from this line by directly addressing emotional induction and measuring its causal effect on moral acceptability\.

#### Emotion Modeling in NLP\.

In recent years, LLMs have been extensively analyzed for sentiment and emotion capabilities\(Sabouret al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib30); Taket al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib47); Liuet al\.,[2025b](https://arxiv.org/html/2604.19125#bib.bib31); Leeet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib46); Zhanget al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib39)\)\. Beyond explicit emotion classification, prior work has examined subtler affective signals in text, including embodied emotion expressions conveyed through physiological and physical reactions\(Zhuanget al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib6); Duonget al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib8); Saimet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib9)\)\.Di Palmaet al\.\([2025](https://arxiv.org/html/2604.19125#bib.bib43)\)probed LLaMA models and found that sentiment information is encoded in hidden layers, improving probe accuracy by up to 14%\. For inducing emotions,Liet al\.\([2023](https://arxiv.org/html/2604.19125#bib.bib42)\)in their work onEmotionPromptproved that LLMs do respond to emotional stimuli when adding emotional phrases with increased performance from 8–115% on general tasks\. Another study proposed theNegativePrompt\(Wanget al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib38)\), which extended this finding by showing that negative emotional stimuli enhance LLM performance when incorporating stress\-response expressions\.

Studies on the intersection of emotion and morality are sparse\.Hooveret al\.\([2020](https://arxiv.org/html/2604.19125#bib.bib40)\)annotated moral sentiment in social media, revealing systematic co\-occurrence patterns between specific emotions and moral foundations in naturalistic text, suggesting that LLMs trained on such data may absorb these associations\. Consistent with this,Scherreret al\.\([2023](https://arxiv.org/html/2604.19125#bib.bib57)\)demonstrates that LLMs encode moral beliefs that are highly sensitive to scenario framing and exhibit uncertainty and inconsistency, particularly in ambiguous cases\. More recently,Russoet al\.\([2026](https://arxiv.org/html/2604.19125#bib.bib41)\)showed that LLMs rely on a narrower set of moral values than humans, with alignment deteriorating sharply as human disagreement increases\.Liuet al\.\([2025a](https://arxiv.org/html/2604.19125#bib.bib37)\)provides the causal evidence that LLMs prioritize emotion over cost in third\-party punishment tasks, andHeet al\.\([2024](https://arxiv.org/html/2604.19125#bib.bib14)\)shows that LLMs’ emotional and moral tone varies across demographic groups\. These findings suggest that the emotion\-morality interaction has been noted in prior work but remains underexplored in studies of affect’s influence on situational morality\.

## 3Experimental Setup

We evaluate our emotion\-induction framework on two datasets grounded in complementary aspects of moral reasoning: Social\-Chem\-101\(Forbeset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib5)\), which captures social norms and moral judgments across everyday situations, and the Justice subset of the ETHICS benchmark\(Hendryckset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib15)\)\. Together, these datasets allow us to examine emotional effects both under contested normative ground and under well\-defined normative labels\.

### 3\.1Social\-Chem\-101 Dataset

We first employ the Social\-Chem\-101 dataset\(Forbeset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib5)\), which comprises moral situations across four subsets\. Two subreddits, namely r/AmItheAsshole \(r/aita\) andr/confessions, both focus on moral dilemmas and interpersonal conflicts\. The other two are the ROCStories \(rocstories\) corpus\(Mostafazadehet al\.,[2016](https://arxiv.org/html/2604.19125#bib.bib7)\)and titles scraped from Dear Abby \(dearabby\)\.222[https://www\.uexpress\.com/life/dearabby/archives](https://www.uexpress.com/life/dearabby/archives)We focus exclusively on the\(r/aita\)subreddit for several reasons\. First,r/aitascenarios are structured as first\-person moral queries that solicit community judgment, making them naturally compatible with our emotion\-induction templates, which prefix an affective state to the narrator’s action\.

Second, the other subsets are less suitable for this purpose:dearabbycontains only advice column titles where it leans more towards ethically wrong narrations,r/confessionslacks explicit moral framing, androcstoriescomprises commonsense narratives not designed for moral evaluation\. By contrast,r/aitaprovides situations explicitly constructed for moral assessment, where individuals describe a first\-person action and seek external judgment\. For example:

> Disowning my foster parents because they were forcing the idea of having kids on me and my wife\.

Action\-Agreement\.Anactionin the dataset is defined as the specific behavior or event being judged within a “Rule\-of\-Thumb” \(RoT\) generated for each situation\. To explore how emotions influence moral judgment across different normative contexts, we partition the dataset using the action\-agreement score, which estimates population\-level consensus \(0–4 scale, where 4 indicates universal acceptance\)\. We create two subsets:contested norms\(scores<3<3\): covering “almost no one” \[0\], “rare/occasional” \[1\], or “controversial” \[2\] representing situations with debated moral status, minority viewpoints, or value conflicts; andconsensus norms\(scores≥3\\geq 3\) where population\-level agreement on the moral verdict is high\. The contested subset exhibits greater diversity in moral intuitions, making emotional perturbations more pronounced\.

Throughout the main analysis, we focus on the contested norms subset \(4,678 situations\), as these cases best reveal how emotional induction shifts judgment in more ambiguous moral situations\. Appendix[C](https://arxiv.org/html/2604.19125#A3)shows the results for consensus norms\.

### 3\.2ETHICS Dataset

We additionally evaluate on the ETHICS benchmark\(Hendryckset al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib15)\)\. This extension serves two purposes: \(1\) to examine whether emotional induction can influence moral judgments even when normative expectations are well\-defined, and \(2\) to study whether our emotion induction pipeline generalizes to a structurally different dataset\. From its five categories, we select theJusticesubset, with the Desert \(entitlement\) subtask, which focuses on first\-person claims of deservingness\. We use the hard test cases from the subset, yielding 1,008 moral situations that are designed to be challenging for current models\. The selection and filtering process is detailed in Appendix[B](https://arxiv.org/html/2604.19125#A2)\.

Table 1:Example contrast set from the ETHICS Justice dataset\. Each group contains four minimally different scenarios, each with a binary label \(1 = reasonable, 0 = unreasonable\)\.Table 2:Example outputs from the emotion induction pipeline \(GPT\-5\.1\)\. Each row shows the original situation and its positive\- and negative\-emotion\-modified variants, with the selected emotion bolded\.#### Contrast Set Structure\.

A distinctive feature of the ETHICS Justice hard\-test cases is their contrast set design\. For example, as shown in Table[1](https://arxiv.org/html/2604.19125#S3.T1), a claim about expecting a partner to take one to dinner is reasonable on an anniversary but unreasonable when one has cheated on them \(identical structure but different moral verdict\)\. Each base scenario appears in four variants, with minimal lexical edits, where two are labeled reasonable and two are labeled unreasonable\. We preserve this structure by assigning a shared emotion pair to all four variants within eachcontrast group, enabling direct comparison of how identical emotions interact with subtle semantic differences\.

Unlike the Social\-Chem\-101, which contains continuous acceptability ratings, the Justice dataset’s contrast\-set structure provides a well\-defined ground\-truth ordering between reasonable and unreasonable claims to measure whether emotions affect distinctions between the two binary labels\. Therefore, we further define two measures to quantify these effects:contrast collapse, whether emotions reduce the score differential between the average of reasonable and unreasonable variants, andcontrast flip, whether emotions reverse their relative ordering, such that unreasonable claims receive higher ratings than their reasonable counterparts\. Formal definitions and an extended example are provided in Appendix[B\.1](https://arxiv.org/html/2604.19125#A2.SS1)\.

### 3\.3Emotion Induction

Since no existing framework manually adds emotion to scenarios, we propose an emotion\-induction pipeline for our curated set of moral situations\. We simulate emotions in a natural, semantic, and coherent way by devising up to four templates for our task\. These were derived from a manual inspection of the filtered sentence structures by selecting forms that accommodate the broadest range of first\-person moral statements with minimal modification: The four templates employed are:Feeling \[emotion\], \[situation\];Out of \[emotion\], \[situation\];In my \[emotion\], \[situation\]; and adverbial modification \(\[Adverb\] \[situation\], e\.g\., “angrily”, “proudly”, etc\)\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x2.png)Figure 2:Mean Shifts in Moral Acceptability for each model for the Social\-Chem\-101 and ETHICS \(Justice subset\)\.We ground our emotion selection in the GoEmotions taxonomy\(Demszkyet al\.,[2020](https://arxiv.org/html/2604.19125#bib.bib11)\)\. For each valence category, we select emotions at the higher end of the intensity spectrum, as more strongly valenced emotions produce more pronounced affective effects\(Shumanet al\.,[2013](https://arxiv.org/html/2604.19125#bib.bib10)\)\. For instance, we prefercompassionovercaringandangeroverannoyed, as the former in each pair carries greater emotional weight\. We exclude ambiguous\-valence emotions from the taxonomy, as they do not reliably signal positive or negative affect\. After refinement, we retain 12 emotions in total: six positive \(compassion, gratitude, joy, love, pride, relief\) and six negative \(anger, disgust, embarrassment, fear, remorse, sadness\)\.

#### Induction Pipeline\.

We employ GPT\-5\.1 to select contextually appropriate emotion pairs and generate emotion\-modified situations using the provided templates\. The model identifies one positive and one negative emotion from our refined taxonomy\. It then rewrites each situation by embedding the selected emotions into the most natural template\. Each emotion is employed uniformly, thereby preventing selection bias that could confound downstream analysis\. We avoid appending explanatory context for why the narrator feels the emotion, ensuring that emotions function as pure affective signals\. Examples can be found in Table[2](https://arxiv.org/html/2604.19125#S3.T2), and all prompts are listed in Appendix[A](https://arxiv.org/html/2604.19125#A1)\.

### 3\.4Evaluation and Model Selection

The resulting dataset contains each original situation paired with a positive\-emotion and a negative\-emotion variant\. To assess the influence of emotions on moral acceptability judgment, we employ a suite of seven LLMs: Qwen\-3\-8B and Qwen3\-30B\-A3B\-Instruct\(Yanget al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib36)\), Llama\-3\.1\-8B and Llama\-3\.3\-70B\(Grattafioriet al\.,[2024](https://arxiv.org/html/2604.19125#bib.bib34)\), GPT\-OSS\-20B\(OpenAIet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib35)\), GPT\-5\.1\(Singhet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib1)\), and Gemini\-3\-Flash\(DeepMind,[2025](https://arxiv.org/html/2604.19125#bib.bib56)\)\. These models are prompted to rate the moral acceptability of all three scenarios per situation \(original, positive, and negative\)\. We employ a 1–7 Likert scale similar to that used in\(Christensenet al\.,[2014](https://arxiv.org/html/2604.19125#bib.bib12); Keshmirianet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib13)\)\. We define the scale for each numeric value, where 1 indicates a clear moral violation, and 7 indicates an entirely acceptable or praiseworthy situation\. Each situation is rated independently to assess how much the moral acceptability shifts under emotions relative to the neutral baseline\.

## 4Results and Analysis

We organize our findings into four analytical perspectives: overall emotion\-induced shift patterns; emotion\-specific effects and valence asymmetry; theoretical congruence with affect\-as\-information predictions; and cross\-model divergence\.

### 4\.1Emotion\-Induced Shifts in Moral Acceptability

We first examine whether emotions systematically alter moral judgments across our model suite\. Figure[2](https://arxiv.org/html/2604.19125#S3.F2)presents the mean shift in moral acceptability ratings when positive and negative emotions are induced, computed asΔ=rmodified−roriginal\\Delta=r\_\{\\text\{modified\}\}\-r\_\{\\text\{original\}\}whererrdenotes a 1–7 Likert rating of moral acceptability\. Across most models, we observe a consistent directional pattern: positive emotions increase moral acceptability \(meanΔ\+\>0\\Delta^\{\+\}\>0\), while negative emotions decrease it \(meanΔ−<0\\Delta^\{\-\}<0\)\. However, the magnitude of these shifts varies substantially across architectures\. Qwen\-3\-8B exhibits the largest sensitivity, with mean shifts of\+1\.21\+1\.21and−1\.15\-1\.15for positive and negative emotions, respectively\. In contrast, Gemini\-3\-Flash and GPT\-5\.1 show attenuated sensitivity to emotions relative to other models, with the former showing a small inverse effect for positive emotions\.

Results on the ETHICS Justice dataset are consistent with these findings: positive emotions increase moral acceptability ratings and negative emotions decrease them, though magnitudes again vary with notably smaller models exhibiting greater mean shifts than their larger counterparts\. This pattern suggests that increased scale may confer some degree of affective robustness\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x3.png)Figure 3:Shift magnitude categorized in four bins in percentage for positive and negative emotion spectrum for the Social\-Chem\-101 dataset\.![Refer to caption](https://arxiv.org/html/2604.19125v1/x4.png)Figure 4:Emotion\-specific effects showing mean shift magnitudes for each emotion label on the Social\-Chem\-101\.To characterize the distribution of shift magnitudes \(perturbations in rating from baseline after adding positive and negative emotions\), we categorize individual situation\-level shifts into four bin distributions displayed in Figure[3](https://arxiv.org/html/2604.19125#S4.F3)\. The magnitude analysis reveals that emotions change the moral acceptability in most cases\. Across models, the vast majority of situations show non\-zero shifts between emotion\-modified and baseline ratings, indicating that emotional context broadly perturbs moral reasoning rather than only in edge cases\. Notably, in the distribution of large shifts \(\|Δ\|≥3\|\\Delta\|\\geq 3\): Llama\-3\.1\-8B produces large magnitude shifts in over 20% of cases with negative emotions, whereas Gemini\-3\-Flash and GPT\-5\.1 rarely exceed the small shift threshold\.

#### Human Annotation\.

Table[3](https://arxiv.org/html/2604.19125#S4.T3)presents the mean ratings across conditions for each annotator\. We contextualize our findings against human moral judgment by recruiting four annotators to rate a random subset of 100 situations from the Social\-Chem\-101 dataset, producing 1,200 ratings in total \(100 situations×\\times3 versions×\\times4 annotators\)\. Each annotator independently rated all three versions \(original, positive emotion, negative emotion\) using the same 1–7 Likert scale employed for LLM evaluation\.

Human responses diverged from the patterns observed in LLMs\. While positive emotions produced modest increases in acceptability \(meanΔ\+\\Delta^\{\+\}= \+0\.20\), negative emotions did not produce systematic decreases; instead, we observed slight increases \(meanΔ−\\Delta^\{\-\}= \+0\.25\)\. This reversal hints that human annotators do not treat negative affect as a simple moral penalty, but may instead interpret it as contextual information that situates an action within extenuating circumstances\.

Only one annotator exhibited the full valence\-congruent pattern that characterized most LLM responses\. Individual variation was substantial, particularly for negative emotions, where annotators ranged from a decrease of 0\.26 points to an increase of 0\.85 points\. This heterogeneity underscores that models’ responses to induced emotion should not be taken as a reflection of how humans reason morally\. Appendix[D](https://arxiv.org/html/2604.19125#A4)provides details on emotion\-specific analysis of the human annotations\.

Table 3:Mean moral acceptability ratings from human annotators across original, positive emotion, and negative emotion conditions \(N=100 situations\)\.

### 4\.2Not All Emotions Are Equal

Figure[4](https://arxiv.org/html/2604.19125#S4.F4)presents mean shift magnitudes for each emotion label\. Within each valence category, individual emotions produce markedly different effects\.

Among positive emotions,compassionproduces the largest shifts, reliably increasing moral acceptability\. This aligns with compassion’s role in moral psychology as a prosocial emotion that promotes forgiveness and charitable interpretation\(Grahamet al\.,[2013](https://arxiv.org/html/2604.19125#bib.bib26)\)\. Importantly, compassionate responses are more readily extended when the subject is not perceived as morally culpable\(Yuet al\.,[2023](https://arxiv.org/html/2604.19125#bib.bib65)\), which may explain why compassion paired with morally contested actions yields the strongest acceptability gains in our results\.Relief, pride and joy, despite being positively valenced, can producedecrementsin acceptability\. We posit thatreliefpresupposes prior wrongdoing, causing models to infer that the narrator anticipated negative consequences, thereby signaling awareness of moral transgression\. The strong decremental effects of anger and disgust are consistent with theCAD triad hypothesis\(Rozinet al\.,[1999](https://arxiv.org/html/2604.19125#bib.bib64)\), which maps these emotions onto violations of autonomy and purity norms, respectively, predicting that their presence signals moral transgression\.

Among negative emotions,remorseshows the strongest paradoxical effect, substantiallyincreasingacceptability despite negative valence\. This finding also resonates with research that remorse signals acknowledgment of wrongdoing, often eliciting forgiveness rather than condemnation\(Tangneyet al\.,[2007](https://arxiv.org/html/2604.19125#bib.bib52)\)\. The model appears to have learned this association, treating remorse as a mitigating factor rather than an amplifier of condemnation\. Appendix[C\.2](https://arxiv.org/html/2604.19125#A3.SS2)shows the mean shift results without the relief/remorse pair label\.

#### Shape of Emotional Perturbation\.

Figure[5](https://arxiv.org/html/2604.19125#S4.F5)presents kernel density estimates of shift distributions across models\. Beyond mean tendencies, the distributional properties of moral shifts reveal important patterns about how emotions perturb judgment\. Most models produce multimodal distributions rather than smooth Gaussian perturbations, suggesting that emotions interact with situation\-specific features to produce discrete revisions in judgment\. The distributions also revealvalence asymmetry in spread: negative emotion produces consistently higher standard deviations than positive emotion across most models\. As shown in Table[4](https://arxiv.org/html/2604.19125#S4.T4), Llama\-3\.1\-8B shows aSD−=2\.29\\text\{SD\}^\{\-\}=2\.29versusSD\+=1\.56\\text\{SD\}^\{\+\}=1\.56, and Qwen\-3\-8B showsSD−=1\.64\\text\{SD\}^\{\-\}=1\.64versusSD\+=1\.01\\text\{SD\}^\{\+\}=1\.01, indicating that negative framing introduces greater response variability across situations\. GPT\-5\.1 emerges as the most conservative model, with a standard deviation of 0\.82 under positive emotion induction, along with 0\.79 across negative emotions\. Whether this conservatism reflects robust affective alignment or an insensitivity to emotionally relevant contextual cues remains an important open question\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x5.png)Figure 5:Kernel density estimates of mean shift distributions across models and affect\-type\.Table 4:Mean shifts \(Δ¯\\bar\{\\Delta\}\) and the standard deviations for positive \(\+\) and negative \(−\-\) emotion conditions\.

### 4\.3Theoretical Congruence of Emotional Effects

Under affect\-as\-information theory\(Schwarz,[2012](https://arxiv.org/html/2604.19125#bib.bib32)\), affective states systematically bias evaluative judgments in the direction implied by the experienced emotion \(provided the affect is perceived as contextually relevant\)\. We formalize this ascongruence: the proportion of situations in which emotions shift moral acceptability in the theoretically expected direction—positive emotions increasing acceptability and negative emotions decreasing it\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x6.png)Figure 6:Congruence rate of each model for Social\-Chem\-101\.Figure[6](https://arxiv.org/html/2604.19125#S4.F6)disaggregates congruence rates across all models\. Congruence rates vary substantially across models\. Qwen\-3\-8B exhibits the highest congruence \(79% fully congruent\), suggesting it processes emotions in close alignment with affect\-as\-information predictions, where it treats the narrator’s emotional state as a reliable indicator of moral valence\. In contrast, GPT\-5\.1 exhibits the lowest congruence and, in some conditions, inverts theoretical expectations, similar to Gemini\-3\-Flash that hovers near chance levels \(50%\), indicating that their change in moral acceptability is not strongly influenced by emotional valence\.

We hypothesize that the incongruence reflects amoral licensing\(Merrittet al\.,[2010](https://arxiv.org/html/2604.19125#bib.bib53)\)mechanism for positive emotions and amitigating circumstancesinterpretation for negative emotions\. When a narrator expresses pride or joy while describing a morally questionable action, the model may interpret this positive affect as indicative of callousness or lack of appropriate guilt, thereby reducing acceptability\. Conversely, when negative emotions such as fear or remorse accompany the same action, the model may interpret them as evidence of moral awareness or extenuating circumstances, paradoxically increasing the acceptability\.

#### Moral Flips in the ETHICS Set\.

The ETHICS Justice dataset has a contrast set structure where each base claim appears in four minimally edited variants with opposing binary labels\. This offers a direct test of whether emotional induction can blur well\-defined moral distinctions rather than merely shift continuous ratings\. Table[5](https://arxiv.org/html/2604.19125#S4.T5)displays the mean shifts under positive and negative emotions and their correspondingcollapse\(reduce the acceptability gap between the reasonable and unreasonable cases\) andfliprates \(reverse the binary labeling\)\. The analysis confirms that emotions can compromise binary distinctions when employing the Likert scale\. In line with previous findings, we can categorize observed patterns by model size\. Smaller models show larger moral flips\. Across models, 18–52% of contrast groups exhibitcollapseunder positive emotion and 30–58% under negative emotion, where the score differential between reasonable and unreasonable claims shrinks\. In parallel, 3–18% of groups show completeflipsunder positive emotion and 4–20% under negative emotion, where unreasonable claims receive higher ratings than their reasonable counterparts\.

Table 5:ETHICS dataset results\.Δ\+⁣/−\\Delta^\{\+/\-\}: mean shift under positive/negative emotion\. Col\./Flip: collapse/flip rates \(%\) under positive/negative emotion\.

### 4\.4Cross\-Model Divergence and Architectural Influences

To quantify distributional differences in emotional sensitivity across model architectures, we compute pairwise Jensen\-Shannon Divergence \(JSD\) on the distributions of moral rating shifts\. JSD provides a symmetric, bounded measure \(0≤JSD≤10\\leq\\text\{JSD\}\\leq 1\) where higher values indicate greater distributional dissimilarity\. Figure[7](https://arxiv.org/html/2604.19125#S4.F7)shows the resulting heatmaps for positive \(lower triangular matrix\) and negative emotions \(upper triangular matrix\)\. The JSD values reveal that models of similar scale exhibit convergent behavior: Llama\-3\.1\-8B and Qwen\-3\-8B show relatively low divergence \(JSD≈0\.25\\approx 0\.25\) for both positive and negative shifts, suggesting comparable sensitivity profiles at the 8B parameter scale\. Most notably, negative emotion\-induced shifts produce higher inter\-model divergence than positive shifts\. The mean pairwise JSD for negative shifts \(JSD−=0\.41\{\\text\{JSD\}\}^\{\-\}=0\.41\) exceeds that for positive shifts \(JSD\+=0\.32\{\\text\{JSD\}\}^\{\+\}=0\.32\), implying that the negative emotions act as a stronger signal to moral situations compared to their positive counterparts\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x7.png)Figure 7:Jensen\-Shannon Divergence across each model for positive/negative affects for Social\-Chem\-101\.

## 5Conclusion

This work presents a controlled analysis of how emotions influence moral judgment in large language models\. Using our emotion\-induction pipeline across seven LLMs and two datasets, we demonstrate that emotional context shifts moral acceptability ratings, with positive emotions increasing ratings by up to\+1\.21\+1\.21points and negative emotions decreasing them by up to−1\.15\-1\.15points on the Social\-Chem\-101 dataset\. On the ETHICS Justice dataset, these effects reverse the moral ordering between reasonable and unreasonable claims in up to 20% of cases\. Across both datasets, smaller models are more susceptible than larger ones\. Individual emotion analysis reveals exceptions to the valence\-congruent pattern, with relief decreasing and remorse increasing acceptability\. A human annotation study shows that humans do not exhibit these systematic shifts\. Taken together, these findings show that as models are increasingly used in judgment\-sensitive settings, this vulnerability to emotional indicators represents an important gap that needs to be addressed in current LLMs\.

## Limitations

We acknowledge the constraints on the scope and generalization of our findings\. First, while we evaluate seven models spanning four architectural families, our analysis does not encompass the full landscape of all LLMs\. In particular, many closed\-source systems beyond those included here remain unexamined, and our conclusions about scale and architecture effects should be interpreted with this scope in mind\. Second, our emotion induction pipeline relies on template\-based modifications that, while ensuring controlled comparisons, may not capture the full complexity of emotion expression in naturalistic discourse\. Finally, our datasets and emotion taxonomy are English\-centric, limiting generalization to other languages and cultural contexts where emotion\-morality mappings may differ substantially\. These questions remain important directions for future investigation\.

## Ethical Considerations

This work analyzes how emotions influence moral judgments in large language models using publicly available, anonymized datasets\. No new personal data is collected\. Our findings reveal that emotional indicators can systematically shift model judgments, exposing a surface\-level sensitivity to affective manipulation\. Our work is diagnostic and does not advocate the use of emotion induction or LLM\-generated moral judgments in real\-world decision\-making\.

## Acknowledgments

We thank the CincyNLP group for their suggestions and feedback\. We also thank the anonymous ACL reviewers for their insightful suggestions\.

## References

- M\. Abdulhai, G\. Serapio\-García, C\. Crepy, D\. Valter, J\. Canny, and N\. Jaques \(2024\)Moral foundations of large language models\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing \(EMNLP 2024\),External Links:[Link](https://aclanthology.org/2024.emnlp-main.982/),[Document](https://dx.doi.org/10.18653/v1/2024.emnlp-main.982)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1)\.
- I\. Chalkidis, A\. Jana, D\. Hartung, M\. Bommarito, I\. Androutsopoulos, D\. Katz, and N\. Aletras \(2022\)LexGLUE: a benchmark dataset for legal language understanding in English\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(ACL 2022\),External Links:[Link](https://aclanthology.org/2022.acl-long.297/),[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.297)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p3.1)\.
- J\. F\. Christensen, A\. Flexas, M\. Calabrese, N\. K\. Gut, and A\. Gomila \(2014\)Moral judgment reloaded: a moral dilemma validation study\.Frontiers in PsychologyVolume 5 \- 2014\.External Links:[Link](https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2014.00607),[Document](https://dx.doi.org/10.3389/fpsyg.2014.00607),ISSN 1664\-1078Cited by:[§3\.4](https://arxiv.org/html/2604.19125#S3.SS4.p1.1)\.
- G\. DeepMind \(2025\)Gemini 3 flash: frontier intelligence built for speed\.External Links:[Link](https://blog.google/products/gemini/gemini-3-flash/)Cited by:[§3\.4](https://arxiv.org/html/2604.19125#S3.SS4.p1.1)\.
- D\. Demszky, D\. Movshovitz\-Attias, J\. Ko, A\. Cowen, G\. Nemade, and S\. Ravi \(2020\)GoEmotions: a dataset of fine\-grained emotions\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics \(ACL 2020\),External Links:[Link](https://aclanthology.org/2020.acl-main.372/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.372)Cited by:[§3\.3](https://arxiv.org/html/2604.19125#S3.SS3.p2.1)\.
- D\. Di Palma, A\. De Bellis, G\. Servedio, V\. W\. Anelli, F\. Narducci, and T\. Di Noia \(2025\)LLaMAs have feelings too: unveiling sentiment and emotion representations in LLaMA models through probing\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(ACL 2025\),External Links:[Link](https://aclanthology.org/2025.acl-long.306/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.306)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- P\. A\. Duong, C\. Luong, D\. Bommana, and T\. Jiang \(2025\)CHEER\-Ekman: fine\-grained embodied emotion classification\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(ACL 2025\),External Links:[Link](https://aclanthology.org/2025.acl-short.88/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-short.88)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- M\. Forbes, J\. D\. Hwang, V\. Shwartz, M\. Sap, and Y\. Choi \(2020\)Social chemistry 101: learning to reason about social and moral norms\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP 2020\),External Links:[Link](https://aclanthology.org/2020.emnlp-main.48/),[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-main.48)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p2.1),[§1](https://arxiv.org/html/2604.19125#S1.p4.1),[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1),[§3\.1](https://arxiv.org/html/2604.19125#S3.SS1.p1.1),[§3](https://arxiv.org/html/2604.19125#S3.p1.1)\.
- J\. Graham, J\. Haidt, S\. Koleva, M\. Motyl, R\. Iyer, S\. P\. Wojcik, and P\. H\. Ditto \(2013\)Moral foundations theory: the pragmatic validity of moral pluralism\.InAdvances in experimental social psychology,External Links:[Link](https://doi.org/10.1016/B978-0-12-407236-7.00002-4)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1),[§4\.2](https://arxiv.org/html/2604.19125#S4.SS2.p2.1)\.
- A\. Grattafiori, A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Vaughan,et al\.\(2024\)The llama 3 herd of models\.arXiv preprint arXiv:2407\.21783\.External Links:[Link](https://arxiv.org/abs/2407.21783)Cited by:[§3\.4](https://arxiv.org/html/2604.19125#S3.SS4.p1.1)\.
- J\. D\. Greene \(2009\)The cognitive neuroscience of moral judgment\.The cognitive neurosciences4,pp\. 1–48\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.7551/mitpress/9504.003.0110)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p2.1),[§1](https://arxiv.org/html/2604.19125#S1.p3.1)\.
- J\. Haidt, R\. J\. Davidson, K\. R\. Scherer, and H\. H\. Goldsmith \(2003\)Handbook of affective sciences\.The moral emotions,pp\. 852–870\.External Links:[Link](https://www.overcominghateportal.org/uploads/5/4/1/5/5415260/the_moral_emotions.pdf)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p2.1)\.
- J\. Haidt \(2001\)The emotional dog and its rational tail: a social intuitionist approach to moral judgment\.\.Psychological review\.External Links:[Link](https://doi.org/10.1037/0033-295x.108.4.814)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p2.1),[§1](https://arxiv.org/html/2604.19125#S1.p3.1)\.
- Z\. He, S\. Guo, A\. Rao, and K\. Lerman \(2024\)Whose emotions and moral sentiments do language models reflect?\.InFindings of the Association for Computational Linguistics \(Findings of ACL 2024\),External Links:[Link](https://aclanthology.org/2024.findings-acl.395/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-acl.395)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p2.1)\.
- D\. Hendrycks, C\. Burns, S\. Basart, A\. Critch, J\. Li, D\. Song, and J\. Steinhardt \(2020\)Aligning ai with shared human values\.arXiv preprint arXiv:2008\.02275\.External Links:[Link](https://arxiv.org/abs/2008.02275)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p2.1),[§1](https://arxiv.org/html/2604.19125#S1.p4.1),[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1),[§3\.2](https://arxiv.org/html/2604.19125#S3.SS2.p1.1),[§3](https://arxiv.org/html/2604.19125#S3.p1.1)\.
- J\. Hoover, G\. Portillo\-Wightman, L\. Yeh, S\. Havaldar, A\. M\. Davani, Y\. Lin, B\. Kennedy, M\. Atari, Z\. Kamel, M\. Mendlen, G\. Moreno, C\. Park, T\. E\. Chang, J\. Chin, C\. Leong, J\. Y\. Leung, A\. Mirinjian, and M\. Dehghani \(2020\)Moral foundations twitter corpus: a collection of 35k tweets annotated for moral sentiment\.Social Psychological and Personality Science\.External Links:[Document](https://dx.doi.org/10.1177/1948550619876629),[Link](https://doi.org/10.1177/1948550619876629)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p2.1)\.
- A\. Huang, Y\. N\. Pi, and C\. Mougan \(2024\)Moral persuasion in large language models: evaluating susceptibility and ethical alignment\.arXiv preprint arXiv:2411\.11731\.External Links:[Link](https://arxiv.org/abs/2411.11731)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p1.1)\.
- L\. Jiang, J\. D\. Hwang, C\. Bhagavatula, R\. L\. Bras, J\. Liang, J\. Dodge, K\. Sakaguchi, M\. Forbes, J\. Borchardt, S\. Gabriel,et al\.\(2021\)Can machines learn morality? the delphi experiment\.arXiv preprint arXiv:2110\.07574\.External Links:[Link](https://arxiv.org/abs/2110.07574)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1)\.
- Z\. Jin, S\. Levine, F\. Gonzalez Adauto, O\. Kamal, M\. Sap, M\. Sachan, R\. Mihalcea, J\. Tenenbaum, and B\. Schölkopf \(2022\)When to make exceptions: exploring language models as accounts of human moral judgment\.InAdvances in Neural Information Processing Systems \(NeurIPS 2022\),External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2022/file/b654d6150630a5ba5df7a55621390daf-Paper-Conference.pdf)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1),[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p2.1)\.
- K\. Kawintiranon and L\. Singh \(2022\)PoliBERTweet: a pre\-trained language model for analyzing political content on Twitter\.InProceedings of the Thirteenth Language Resources and Evaluation Conference \(LREC 2022\),External Links:[Link](https://aclanthology.org/2022.lrec-1.801/)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p3.1)\.
- A\. Keshmirian, R\. Baltaji, B\. Hemmatian, H\. Asghari, and L\. R\. Varshney \(2025\)Many llms are more utilitarian than one\.External Links:2507\.00814,[Link](https://arxiv.org/abs/2507.00814)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1),[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p2.1),[§3\.4](https://arxiv.org/html/2604.19125#S3.SS4.p1.1)\.
- S\. Kumar and D\. Jurgens \(2025\)Are rules meant to be broken? understanding multilingual moral reasoning as a computational pipeline with UniMoral\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(ACL 2025\),External Links:[Link](https://aclanthology.org/2025.acl-long.294/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.294)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1)\.
- J\. Lee, W\. Lee, O\. Kwon, and H\. Kim \(2025\)Do large language models have “emotion neurons”? investigating the existence and role\.InFindings of the Association for Computational Linguistics \(Findings of ACL 2025\),External Links:[Link](https://aclanthology.org/2025.findings-acl.806/),[Document](https://dx.doi.org/10.18653/v1/2025.findings-acl.806)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- C\. Li, J\. Wang, Y\. Zhang, K\. Zhu, W\. Hou, J\. Lian, F\. Luo, Q\. Yang, and X\. Xie \(2023\)Large language models understand and can be enhanced by emotional stimuli\.External Links:2307\.11760,[Link](https://arxiv.org/abs/2307.11760)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- H\. Liu, Y\. Dai, H\. Tan, Y\. Lei, Y\. Zhou, and Z\. Wu \(2025a\)Outraged ai: large language models prioritise emotion over cost in fairness enforcement\.External Links:2510\.17880,[Link](https://arxiv.org/abs/2510.17880)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p2.1)\.
- Z\. Liu, L\. Qian, Q\. Xie, J\. Huang, K\. Yang, and S\. Ananiadou \(2025b\)MMAFFBen: a multilingual and multimodal affective analysis benchmark for evaluating llms and vlms\.External Links:2505\.24423,[Link](https://arxiv.org/abs/2505.24423)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- N\. Lourie, R\. Le Bras, and Y\. Choi \(2021\)SCRUPLES: a corpus of community ethical judgments on 32,000 real\-life anecdotes\.Proceedings of the AAAI Conference on Artificial Intelligence \(AAAI 2021\)\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/17589),[Document](https://dx.doi.org/10.1609/aaai.v35i15.17589)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1)\.
- A\. C\. Merritt, D\. A\. Effron, and B\. Monin \(2010\)Moral self\-licensing: when being good frees us to be bad\.Social and personality psychology compass4\(5\),pp\. 344–357\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1111/j.1751-9004.2010.00263.x),[Link](https://doi.org/10.1111/j.1751-9004.2010.00263.x)Cited by:[§4\.3](https://arxiv.org/html/2604.19125#S4.SS3.p3.1)\.
- N\. Mostafazadeh, N\. Chambers, X\. He, D\. Parikh, D\. Batra, L\. Vanderwende, P\. Kohli, and J\. Allen \(2016\)A corpus and cloze evaluation for deeper understanding of commonsense stories\.InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(NAACL\-HLT 2016\),External Links:[Link](https://aclanthology.org/N16-1098/),[Document](https://dx.doi.org/10.18653/v1/N16-1098)Cited by:[§3\.1](https://arxiv.org/html/2604.19125#S3.SS1.p1.1)\.
- J\. L\. Nunes, G\. F\. Almeida, M\. De Araujo, and S\. D\. Barbosa \(2024\)Are large language models moral hypocrites? a study based on moral foundations\.InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society \(AAAI 2024\),External Links:[Link](https://doi.org/10.1609/aies.v7i1.31704)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p1.1)\.
- OpenAI, :, S\. Agarwal, L\. Ahmad, J\. Ai, S\. Altman, A\. Applebaum, E\. Arbus, R\. K\. Arora, Y\. Bai, B\. Baker, H\. Bao, B\. Barak, A\. Bennett, T\. Bertao, N\. Brett, E\. Brevdo, G\. Brockman, S\. Bubeck, C\. Chang, K\. Chen, M\. Chen, E\. Cheung, A\. Clark, D\. Cook, M\. Dukhan, C\. Dvorak, K\. Fives, V\. Fomenko, T\. Garipov, K\. Georgiev, M\. Glaese, T\. Gogineni, A\. Goucher, L\. Gross, K\. G\. Guzman, J\. Hallman, J\. Hehir, J\. Heidecke, A\. Helyar, H\. Hu, R\. Huet, J\. Huh, S\. Jain, Z\. Johnson, C\. Koch, I\. Kofman, D\. Kundel, J\. Kwon, V\. Kyrylov, E\. Y\. Le, G\. Leclerc, J\. P\. Lennon, S\. Lessans, M\. Lezcano\-Casado, Y\. Li, Z\. Li, J\. Lin, J\. Liss, Lily, Liu, J\. Liu, K\. Lu, C\. Lu, Z\. Martinovic, L\. McCallum, J\. McGrath, S\. McKinney, A\. McLaughlin, S\. Mei, S\. Mostovoy, T\. Mu, G\. Myles, A\. Neitz, A\. Nichol, J\. Pachocki, A\. Paino, D\. Palmie, A\. Pantuliano, G\. Parascandolo, J\. Park, L\. Pathak, C\. Paz, L\. Peran, D\. Pimenov, M\. Pokrass, E\. Proehl, H\. Qiu, G\. Raila, F\. Raso, H\. Ren, K\. Richardson, D\. Robinson, B\. Rotsted, H\. Salman, S\. Sanjeev, M\. Schwarzer, D\. Sculley, H\. Sikchi, K\. Simon, K\. Singhal, Y\. Song, D\. Stuckey, Z\. Sun, P\. Tillet, S\. Toizer, F\. Tsimpourlas, N\. Vyas, E\. Wallace, X\. Wang, M\. Wang, O\. Watkins, K\. Weil, A\. Wendling, K\. Whinnery, C\. Whitney, H\. Wong, L\. Yang, Y\. Yang, M\. Yasunaga, K\. Ying, W\. Zaremba, W\. Zhan, C\. Zhang, B\. Zhang, E\. Zhang, and S\. Zhao \(2025\)Gpt\-oss\-120b & gpt\-oss\-20b model card\.External Links:2508\.10925,[Link](https://arxiv.org/abs/2508.10925)Cited by:[§3\.4](https://arxiv.org/html/2604.19125#S3.SS4.p1.1)\.
- J\. T\. Ornstein, E\. N\. Blasingame, and J\. S\. Truscott \(2025\)How to train your stochastic parrot: large language models for political texts\.Political Science Research and Methods13\(2\),pp\. 264–281\.External Links:[Document](https://dx.doi.org/10.1017/psrm.2024.64),[Link](https://doi.org/10.1017/psrm.2024.64)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p3.1)\.
- P\. Rozin, L\. Lowery, S\. Imada, and J\. Haidt \(1999\)The CAD triad hypothesis: a mapping between three moral emotions \(contempt, anger, disgust\) and three moral codes \(community, autonomy, divinity\)\.Journal of Personality and Social Psychology\.External Links:[Link](https://doi.org/10.1037/0022-3514.76.4.574)Cited by:[§4\.2](https://arxiv.org/html/2604.19125#S4.SS2.p2.1)\.
- G\. Russo, D\. Nozza, P\. Röttger, and D\. Hovy \(2026\)The pluralistic moral gap: understanding moral judgment and value differences between humans and large language models\.InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics \(EACL 2026\),External Links:[Link](https://aclanthology.org/2026.eacl-long.305/),[Document](https://dx.doi.org/10.18653/v1/2026.eacl-long.305)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p2.1)\.
- S\. Sabour, S\. Liu, Z\. Zhang, J\. Liu, J\. Zhou, A\. Sunaryo, T\. Lee, R\. Mihalcea, and M\. Huang \(2024\)EmoBench: evaluating the emotional intelligence of large language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(ACL 2024\),External Links:[Link](https://aclanthology.org/2024.acl-long.326/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.326)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- P\. Sachdeva and T\. van Nuenen \(2025\)Normative evaluation of large language models with everyday moral dilemmas\.InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency,FAccT ’25\.External Links:ISBN 9798400714825,[Link](https://doi.org/10.1145/3715275.3732044),[Document](https://dx.doi.org/10.1145/3715275.3732044)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1)\.
- M\. Saim, P\. A\. Duong, C\. Luong, A\. Bhanderi, and T\. Jiang \(2025\)Anatomy of a feeling: narrating embodied emotions via large vision\-language models\.InFindings of the Association for Computational Linguistics\(Findings of EMNLP 2025\),External Links:[Link](https://aclanthology.org/2025.findings-emnlp.1276/),[Document](https://dx.doi.org/10.18653/v1/2025.findings-emnlp.1276)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- M\. Sap, R\. Le Bras, D\. Fried, and Y\. Choi \(2022\)Neural theory\-of\-mind? on the limits of social intelligence in large LMs\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing \(EMNLP 2022\),External Links:[Link](https://aclanthology.org/2022.emnlp-main.248/),[Document](https://dx.doi.org/10.18653/v1/2022.emnlp-main.248)Cited by:[Appendix D](https://arxiv.org/html/2604.19125#A4.SS0.SSS0.Px1.p3.1)\.
- N\. Scherrer, C\. Shi, A\. Feder, and D\. Blei \(2023\)Evaluating the moral beliefs encoded in llms\.InAdvances in Neural Information Processing Systems \(NeurIPS 2024\),External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/a2cf225ba392627529efef14dc857e22-Paper-Conference.pdf)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p2.1)\.
- N\. Schwarz \(2012\)Feelings\-as\-information theory\.Handbook of theories of social psychology: Volume 1,pp\. 289–308\.External Links:[Link](https://doi.org/10.4135/9781446249215.n15)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p3.1),[§4\.3](https://arxiv.org/html/2604.19125#S4.SS3.p1.1)\.
- B\. Shu, I\. Joshi, M\. Karnaze, A\. C\. Pham, I\. Kakkar, S\. Kothe, A\. Hovasapian, and M\. ElSherief \(2025\)Fluent but unfeeling: the emotional blind spots of language models\.External Links:2509\.09593,[Link](https://arxiv.org/abs/2509.09593)Cited by:[Appendix D](https://arxiv.org/html/2604.19125#A4.SS0.SSS0.Px1.p3.1)\.
- V\. Shuman, D\. Sander, and K\. R\. Scherer \(2013\)Levels of valence\.Frontiers in PsychologyVolume 4 \- 2013\.External Links:[Link](https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2013.00261),[Document](https://dx.doi.org/10.3389/fpsyg.2013.00261),ISSN 1664\-1078Cited by:[§3\.3](https://arxiv.org/html/2604.19125#S3.SS3.p2.1)\.
- A\. Singh, A\. Fry, A\. Perelman, A\. Tart,et al\.\(2025\)OpenAI gpt\-5 system card\.Note:Preprint\. arXiv:2601\.03267External Links:2601\.03267,[Link](https://arxiv.org/abs/2601.03267)Cited by:[§3\.4](https://arxiv.org/html/2604.19125#S3.SS4.p1.1)\.
- A\. N\. Tak, A\. Banayeeanzade, A\. Bolourani, M\. Kian, R\. Jia, and J\. Gratch \(2025\)Mechanistic interpretability of emotion inference in large language models\.InFindings of the Association for Computational Linguistics \(Findings of ACL 2025\),External Links:[Link](https://aclanthology.org/2025.findings-acl.679/),[Document](https://dx.doi.org/10.18653/v1/2025.findings-acl.679)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- Z\. Talat, H\. Blix, J\. Valvoda, M\. I\. Ganesh, R\. Cotterell, and A\. Williams \(2022\)On the machine learning of ethical judgments from natural language\.InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(NAACL\-HLT 2022\),External Links:[Link](https://aclanthology.org/2022.naacl-main.56/),[Document](https://dx.doi.org/10.18653/v1/2022.naacl-main.56)Cited by:[Appendix D](https://arxiv.org/html/2604.19125#A4.SS0.SSS0.Px1.p3.1),[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1)\.
- J\. P\. Tangney, J\. Stuewig, and D\. J\. Mashek \(2007\)Moral emotions and moral behavior\.Annu\. Rev\. Psychol\.58\(1\),pp\. 345–372\.External Links:[Link](https://doi.org/10.1146/annurev.psych.56.091103.070145)Cited by:[§4\.2](https://arxiv.org/html/2604.19125#S4.SS2.p3.1)\.
- P\. Valdesolo and D\. DeSteno \(2006\)Manipulations of emotional context shape moral judgment\.PSYCHOLOGICAL SCIENCE\-CAMBRIDGE\-17\(6\),pp\. 476\.External Links:[Link](https://doi.org/10.1111/j.1467-9280.2006.01731.x)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p2.1)\.
- X\. Wang, C\. Li, Y\. Chang, J\. Wang, and Y\. Wu \(2024\)NegativePrompt: leveraging psychology for large language models enhancement via negative emotional stimuli\.InProceedings of the Thirty\-Third International Joint Conference on Artificial Intelligence, \(IJCAI\-24\),External Links:[Document](https://dx.doi.org/10.24963/ijcai.2024/719),[Link](https://doi.org/10.24963/ijcai.2024/719)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv, C\. Zheng, D\. Liu, F\. Zhou, F\. Huang, F\. Hu, H\. Ge, H\. Wei, H\. Lin, J\. Tang, J\. Yang, J\. Tu, J\. Zhang, J\. Yang, J\. Yang, J\. Zhou, J\. Zhou, J\. Lin, K\. Dang, K\. Bao, K\. Yang, L\. Yu, L\. Deng, M\. Li, M\. Xue, M\. Li, P\. Zhang, P\. Wang, Q\. Zhu, R\. Men, R\. Gao, S\. Liu, S\. Luo, T\. Li, T\. Tang, W\. Yin, X\. Ren, X\. Wang, X\. Zhang, X\. Ren, Y\. Fan, Y\. Su, Y\. Zhang, Y\. Zhang, Y\. Wan, Y\. Liu, Z\. Wang, Z\. Cui, Z\. Zhang, Z\. Zhou, and Z\. Qiu \(2025\)Qwen3 technical report\.External Links:2505\.09388,[Link](https://arxiv.org/abs/2505.09388)Cited by:[§3\.4](https://arxiv.org/html/2604.19125#S3.SS4.p1.1)\.
- H\. Yu, J\. Chen, B\. Dardaine, and F\. Yang \(2023\)Moral barrier to compassion: how perceived badness of sufferers dampens observers’ compassionate responses\.Cognition237,pp\. 105476\.External Links:[Link](https://doi.org/10.1016/j.cognition.2023.105476),[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cognition.2023.105476)Cited by:[§4\.2](https://arxiv.org/html/2604.19125#S4.SS2.p2.1)\.
- J\. Yu, M\. Huber, and K\. Tang \(2024\)GreedLlama: performance of financial value\-aligned large language models in moral reasoning\.External Links:2404\.02934,[Link](https://arxiv.org/abs/2404.02934)Cited by:[§1](https://arxiv.org/html/2604.19125#S1.p1.1)\.
- W\. Zhang, Y\. Deng, B\. Liu, S\. Pan, and L\. Bing \(2024\)Sentiment analysis in the era of large language models: a reality check\.InFindings of the Association for Computational Linguistics \(Findings of NAACL 2024\),External Links:[Link](https://aclanthology.org/2024.findings-naacl.246/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-naacl.246)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- Y\. Zhuang, T\. Jiang, and E\. Riloff \(2024\)My heart skipped a beat\! recognizing expressions of embodied emotion in natural language\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(NAACL\-HLT 2024\),External Links:[Link](https://aclanthology.org/2024.naacl-long.193/),[Document](https://dx.doi.org/10.18653/v1/2024.naacl-long.193)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px2.p1.1)\.
- C\. Ziems, J\. A\. Yu, Y\. Wang, A\. Y\. Halevy, and D\. Yang \(2022\)The moral integrity corpus: a benchmark for ethical dialogue systems\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(ACL 2022\),External Links:[Link](https://aclanthology.org/2022.acl-long.261/),[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.261)Cited by:[§2](https://arxiv.org/html/2604.19125#S2.SS0.SSS0.Px1.p1.1)\.

## Appendix APrompts and Usage Scripts

Our experimental pipeline employs three distinct prompts executed in sequence: emotion selection, template selection, and moral rating evaluation\. All prompts are carefully designed to maintain consistency in emotion induction while enabling systematic variation across moral scenarios\.

### A\.1Emotion Selection Prompt

The emotion selection prompt \(Figure[10](https://arxiv.org/html/2604.19125#A1.F10)\) is used to identify the most contextually appropriate positive and negative emotions for each moral situation\. We constrain the selection to six positive emotions \(relief, gratitude, pride, compassion, joy, love\) and six negative emotions \(remorse, anger, disgust, embarrassment, fear, sadness\) drawn from the GoEmotions taxonomy\. The prompt instructs the model to select emotions that will create a strong moral contrast while remaining plausible from the first\-person narrator’s perspective\.

This prompt is executed using GPT\-5\.1, which is required to provide a one\-sentence justification for each emotion pair, ensuring that selections are grounded in the moral content of the situation rather than in arbitrary associations\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x8.png)Figure 8:Prompt for rating each situation employed uniformly by all models\.![Refer to caption](https://arxiv.org/html/2604.19125v1/x9.png)Figure 9:Template Selection Prompt for GPT 5\.1\.
### A\.2Template Selection Prompt

Following emotion selection, the template selection prompt \(Figure[9](https://arxiv.org/html/2604.19125#A1.F9)\) is used to generate emotion\-modified versions of each moral situation\. Four syntactic templates are provided:

1. 1\.“Feeling \[emotion\], \[exact situation\]”
2. 2\.“Out of \[emotion\], \[exact situation\]”
3. 3\.“In my \[emotion\], \[exact situation\]”
4. 4\.“\[Adverb\] \[exact situation\]” \(e\.g\.,angrily,sadly,proudly\)

The prompt explicitly prohibits explanatory additions \(e\.g\., “because…” or “due to…”\) to ensure emotions function as pure affective signals rather than causal justifications\. Models are instructed to select the most natural\-sounding template for each emotion while keeping the underlying situation identical except for the affective addition\. This prompt allows natural variation in template selection while maintaining grammatical coherence\. The output consists of two modified versions per situation: one incorporating the selected positive emotion and one the selected negative emotion\.

### A\.3Rating Prompt

The moral rating prompt \(Figure[8](https://arxiv.org/html/2604.19125#A1.F8)\) presents three versions of each situation \(original, positive emotion, negative emotion\) for evaluation on a 7\-point Likert scale, where 1 indicates “completely unacceptable” and 7 indicates “completely acceptable\.” The rating scale includes explicit anchors at each level to ensure consistent interpretation across models\.

Unlike the generation prompts, the rating prompt is administered to all models, with a temperature of 0\.2 to promote consistent, stable moral judgments\. Models were instructed to rate each version independently and provide brief structured reasoning: one sentence explaining the rating and one sentence comparing the emotion\-modified version to the original baseline\.

The prompt emphasizes that models should consider “how much the moral acceptability changes with emotions added,” directing attention to the incremental effect of emotions on moral judgment\. This design enables us to compute emotion\-induced shifts \(positive and negative\) and the total emotional range for each situation\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x10.png)Figure 10:Emotion Selection Prompt for GPT 5\.1\.

## Appendix BETHICS dataset

The Justice subset contains claims about desert, entitlement, and fairness\. This category is well\-suited to our experimental design for several reasons\. First, justice claims are structured as first\-person assertions \(e\.g\.,“I deserve X because Y”\), matching the narrator\-centric framing of our emotion induction templates\. Second, each scenario carries a binary label indicating whether the claim isreasonable\(1\) orunreasonable\(0\) as judged by impartial observers, providing annotated normative labels\.

#### Filtering Procedure\.

The Justice hard\-test cases comprise two tasks:ImpartialityandDesert\. We retain the Desert scenarios, which contain explicit claims of deservingness or entitlement\. Specifically, we retain sentences matching patterns such as “I deserve,” “I am justified,” “I am entitled,” and related formulations\. This filtering ensures compatibility with our emotion\-induction templates, which prepend an emotional state to the narrator’s claim \(e\.g\.,“Feeling \[emotion\], I deserve…”\)\. Impartiality scenarios, which follow a “I usually X but Y because Z” structure, are excluded as the induction of emotions would ambiguously attach to either the habitual action \(X\) or the deviation \(Z\)\.

### B\.1Contrast Set Metrics

We recall that each contrast group contains four variants of a base claim—two labeled reasonable and two unreasonable\. We define two metrics to quantify how the induced emotions affect the normative distinction within each group\.

#### Contrast Collapse\.

Lets¯1\\bar\{s\}\_\{1\}ands¯0\\bar\{s\}\_\{0\}denote the mean scores for reasonable and unreasonable variants, respectively\. Thelabel gapunder conditionc∈\{orig,pos,neg\}c\\in\\\{\\text\{orig\},\\text\{pos\},\\text\{neg\}\\\}is:

Gc=s¯1\(c\)−s¯0\(c\)G\_\{c\}=\\bar\{s\}\_\{1\}^\{\(c\)\}\-\\bar\{s\}\_\{0\}^\{\(c\)\}\(1\)Collapse occurs when an induced emotion reduces the gap magnitude:

Collapsec=\[\|Gc\|<\|Gorig\|\]\\textsc\{Collapse\}\_\{c\}=\\left\[\|G\_\{c\}\|<\|G\_\{\\text\{orig\}\}\|\\right\]\(2\)

#### Contrast Flip\.

A flip occurs when the relative ordering of reasonable and unreasonable claims reverses:

Flipc=\[sign​\(Gc\)≠sign​\(Gorig\)\],Gorig≠0\\textsc\{Flip\}\_\{c\}=\\left\[\\text\{sign\}\(G\_\{c\}\)\\neq\\text\{sign\}\(G\_\{\\text\{orig\}\}\)\\right\],\\quad G\_\{\\text\{orig\}\}\\neq 0\(3\)

#### Example\.

Consider a group with original scores: reasonable variants average 5\.5, unreasonable average 3\.0, yieldingGorig=\+2\.5G\_\{\\text\{orig\}\}=\+2\.5\. After negative emotion induction, suppose reasonable drops to 4\.0 and unreasonable rises to 4\.5, givingGneg=−0\.5G\_\{\\text\{neg\}\}=\-0\.5\. Since\|−0\.5\|<\|\+2\.5\|\|\{\-\}0\.5\|<\|\{\+\}2\.5\|, collapse occurs\. Sincesign​\(−0\.5\)≠sign​\(\+2\.5\)\\text\{sign\}\(\-0\.5\)\\neq\\text\{sign\}\(\+2\.5\), a flip also occurs, i\.e\., the model now rates unreasonable claims as more acceptable than reasonable ones\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x11.png)Figure 11:Emotion\-specific effects showing mean shift magnitudes for each emotion label on the Social\-Chem\-101 consensus norms\.

## Appendix CBehavioral Analysis

Figure[12](https://arxiv.org/html/2604.19125#A3.F12)presents mean shifts in moral acceptability for consensus norms \(action\-agreement≥3\\geq 3\), where normative expectations are widely shared\. Consistent with contested norms, we observe the same directional pattern: positive emotions increase acceptability \(Llama\-3\.1\-8B: \+1\.18, Qwen\-3\-8B: \+0\.97, Qwen\-3\-30B: \+1\.13\), while negative emotions decrease it \(Llama\-3\.1\-8B:−0\.87\-0\.87, Qwen\-3\-8B:−0\.64\-0\.64\)\. However, effect magnitudes are comparable to or slightly larger than those in contested norms\. GPT\-5\.1 and Gemini\-3\-Flash maintain near\-immunity, reinforcing their stability across normative contexts\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/x12.png)Figure 12:Mean Shift of Moral Acceptability for each model on action\-agreement \>3 \(consensus norm\)\.Emotion\-specific patterns \(Figure[11](https://arxiv.org/html/2604.19125#A2.F11)\) also show consistency with consensus norms\. Compassion remains the most strongly congruent positive emotion across models \(Llama\-3\.1\-8B: \+1\.79, Qwen\-3\-8B: \+1\.91\), while remorse shows a paradoxical increase in acceptability despite its negative valence \(Qwen\-3\-30B: \+1\.79, Llama\-3\.3\-70B: \+0\.76\)\. Pride and relief continue to produce incongruent decrements \(Gemini\-3\-Flash: pride−0\.38\-0\.38, relief−0\.44\-0\.44\), suggesting that these patterns reflect learned emotion\-morality associations rather than artifacts of normative ambiguity\. The generalization, along with the Justice contrast set, demonstrates that emotional induction constitutes a systematic vulnerability in LLM moral reasoning, regardless of whether normative labels are annotated or contested\. These findings also refute the interpretation that emotional susceptibility emerges solely from decision\-boundary fragility in uncertain cases\.

![Refer to caption](https://arxiv.org/html/2604.19125v1/10_emotion_effect_by_label.png)Figure 13:Emotion\-specific effects showing mean shift magnitudes for each emotion label on the Justice Dataset for both reasonable and unreasonable claims\.![Refer to caption](https://arxiv.org/html/2604.19125v1/x13.png)Figure 14:Emotion\-specific mean shifts in moral acceptability for human annotators \(A1–A4\) across positive \(left\) and negative \(right\) emotion conditions\.### C\.1Emotion\-Specific Effects in the Justice Dataset

Figure[13](https://arxiv.org/html/2604.19125#A3.F13)presents emotion\-specific shift patterns across reasonable and unreasonable claims in the ETHICS Justice dataset\. Unlike the Social\-Chem\-101 results, which analyze emotional effects on a continuous, contested moral spectrum, the Justice dataset enables a direct comparison of how identical emotions affect claims with opposing normative status\.

Positive Emotions on Unreasonable Claims\.When positive emotions accompany unreasonable claims, we observe substantial upward shifts across most models\. Compassion produces the largest effect, though with notable cross\-model variation: Qwen\-3\-8B shows a mean shift of\+1\.47\+1\.47, while GPT\-OSS\-20B shows an inverse shift of−1\.48\-1\.48, suggesting it interprets positive affect on unjustified claims as evidence of moral obtuseness rather than charitable intent\. This pattern indicates that positive affective framing can partially legitimize normatively unreasonable claims across a substantial portion of the model suite\. Joy and love produce more modest but consistent upward shifts \(0\.25–1\.25 range\), while relief shows high cross\-model variance\.

Positive Emotions on Reasonable Claims\.For reasonable claims \(top\-right panel\), positive emotions produce more minor magnitude shifts \(0−1\.250\-1\.25range\) compared to unreasonable claims, indicating a ceiling effect where already\-acceptable claims experience diminished emotional amplification\. Compassion remains the most potent positive modifier \(\+1\.20\+1\.20for Llama\-3\.3\-70B\), while Gemini\-3\-Flash again shows negative shifts for most emotions\.

Negative Emotions on Unreasonable Claims\.Negative emotions applied to unreasonable claims \(bottom\-left panel\) produce the expected decremental effect, further reducing acceptability ratings\. Anger, disgust, and embarrassment generate consistent downward shifts \(\-0\.50 to \-1\.50 range\), with Llama\-3\.1\-8B showing the strongest response \(anger: \-1\.08, disgust: \-1\.38\)\. However, remorse exhibits the opposite behavior: most models show near\-zero or small negative shifts, whereas GPT\-OSS\-20B produces a substantial positive shift \(\+1\.70\), treating remorse as a mitigating factor that partially redeems even unreasonable claims\.

Negative Emotions on Reasonable Claims\.For reasonable claims \(bottom\-right panel\), negative emotions universally decrease acceptability, with effect magnitudes \(−0\.50\-0\.50to−2\.00\-2\.00\) exceeding those observed for unreasonable claims\. This asymmetry reveals that negative affective framing more severely undermines justified claims than it further condemns unjustified ones\. Anger produces the largest decrements across models \(mean:−1\.50\-1\.50for Qwen\-3\-8B,−2\.02\-2\.02for Qwen\-3\-30B\), while remorse shows the weakest effect, with GPT\-5\.1 and GPT\-OSS\-20B exhibiting positive shifts \(\+0\.15,\+0\.53\+0\.15,\+0\.53\), again demonstrating that remorse signals moral awareness rather than amplifying condemnation\.

Cross\-Model Patterns\.Gemini\-3\-Flash consistently shows the smallest shift magnitudes and frequent inverse effects, aligning with its near\-immunity to emotional induction observed in Social\-Chem\-101\. Llama\-3\.1\-8B and Qwen\-3\-8B exhibit the highest sensitivity, with large shifts across all emotion\-claim combinations\. GPT\-5\.1 shows moderate sensitivity but distinctive remorse handling\. These findings confirm that emotional susceptibility patterns generalize across datasets while revealing emotion\-specific processing differences\. Models treat compassion as universally positive, remorse as a mitigating signal of moral awareness, and anger/disgust as amplifiers of condemnation regardless of the validity of the claim\.

### C\.2Pure Valence Effects Excluding Paradoxical Emotions

We also assess whether the overall directional patterns we report are driven by the full emotion set or are robust to the removal of exceptional emotion labels, such as pride, remorse, and relief\. We thus recompute the mean shifts, excluding one pair of labels that show inverse effects relative to their nominal valence: relief and remorse\. Table[6](https://arxiv.org/html/2604.19125#A3.T6)reports mean shifts under both the full and reduced emotion sets\. Removing these two labels strengthens the directional signal in both conditions: positive shifts increase, and negative shifts become more uniformly negative across all models\. These results confirm that the paradoxical labels constitute genuine exceptions to the valence\-congruent pattern rather than noise, and that the core directional effect is robust to their exclusion\.

Table 6:Mean shifts \(Δ¯\\bar\{\\Delta\}\) for positive and negative emotion conditions across all models\.Δ¯\+\\bar\{\\Delta\}^\{\+\}andΔ¯−\\bar\{\\Delta\}^\{\-\}include all six emotions per valence;Δ¯RR\+\\bar\{\\Delta\}^\{\+\}\_\{\\text\{RR\}\}andΔ¯RR−\\bar\{\\Delta\}^\{\-\}\_\{\\text\{RR\}\}exclude relief and remorse respectively\.

## Appendix DHuman Annotation Study

#### Emotion\-Specific Patterns\.

Figure[14](https://arxiv.org/html/2604.19125#A3.F14)presents mean shifts disaggregated by emotion label for each annotator\. Unlike the patterns observed in LLMs \(Figure[4](https://arxiv.org/html/2604.19125#S4.F4)\), human responses exhibit substantial heterogeneity across various emotions\. For positive emotions, gratitude produced the most consistent increases across annotators, while pride, which decreased acceptability in most LLMs, showed similar trends\. Compassion, the most strongly congruent positive emotion in LLMs \(dd= \+1\.02\), elicited highly variable human responses ranging from \+0\.3 to \+1\.9\.

The divergence is more pronounced for negative emotions\. Remorse, which inversely increased acceptability in LLMs, produced similarly paradoxical increases for all human annotators, suggesting this pattern may reflect genuine moral\-psychological associations rather than LLM\-specific artifacts\. However, anger and disgust, which produced consistent decrements in LLMs, showed no systematic direction in humans: annotators reported decreases ranging from 1\.1 points to increases of 0\.6 points for anger\. Sadness exhibited the highest inter\-annotator variance, with shifts spanning from near\-zero to \+2\.0 points\.

These results suggest that the valence\-congruent heuristic observed in LLMs, where positive emotions increase moral acceptability and negative emotions decrease it, does not straightforwardly reflect human moral cognition\. Human annotators appear to integrate emotional context in more individualized and context\-sensitive ways, potentially drawing on world knowledge, theory of mind, or situation\-specific reasoning that resists reduction to simple valence matching\. This divergence aligns with broader observations that LLMs and humans process moral and emotional information through fundamentally different mechanisms\(Sapet al\.,[2022](https://arxiv.org/html/2604.19125#bib.bib54); Talatet al\.,[2022](https://arxiv.org/html/2604.19125#bib.bib55); Shuet al\.,[2025](https://arxiv.org/html/2604.19125#bib.bib49)\)\. It highlights the need for caution when interpreting LLM moral judgments as proxies for human values\.

Similar Articles

Negative Before Positive: Asymmetric Valence Processing in Large Language Models

arXiv cs.CL

This paper investigates how large language models process emotional valence through mechanistic interpretability. Using activation patching and steering on three open-source LLMs, the authors find that negative valence is localized to early layers while positive valence peaks in mid-to-late layers, and they validate this through topic-controlled flip tests.

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

arXiv cs.CL

This paper investigates whether assigning personas to large language models induces human-like motivated reasoning, finding that persona-assigned LLMs show up to 9% reduced veracity discernment and are up to 90% more likely to evaluate scientific evidence in ways congruent with their induced political identity, with prompt-based debiasing largely ineffective.

Evaluation Awareness in Language Models Has Limited Effect on Behaviour

arXiv cs.CL

This paper investigates whether verbalized evaluation awareness (VEA) in large reasoning models causally affects their behavior on safety, alignment, moral reasoning, and political opinion benchmarks. The authors find that VEA has limited behavioral impact, with near-zero effects from injecting VEA and small shifts from removing it, suggesting that high VEA rates should not be taken as strong evidence of strategic behavior or alignment tampering.