Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities
Summary
This paper uses a pre-trained LLM with zero-shot classification to analyze approximately 20 million Twitch chat messages across seven game genres, finding that 2.4% of messages are toxic, with MOBA games having the highest rate (3.2%) and sports games the lowest (2%). The study also identifies significant differences in toxicity distributions across individual games within the same genre.
View Cached Full Text
Cached at: 05/26/26, 09:00 AM
# Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities Source: [https://arxiv.org/html/2605.24000](https://arxiv.org/html/2605.24000) ###### Abstract Toxicity in online gaming communities remains a persistent challenge, manifesting across genres, platforms, and player interactions\. While much research is focused on in\-game toxicity, less is known about how toxic behavior varies between gaming communities on streaming platforms\. To address this shortcoming, we analyze approximately 20 million chat messages from 4,452 streams, spanning seven game genres on Twitch\. We categorize messages according to Twitch’s toxicity taxonomy with a pre\-trained Large Language Model using zero\-shot classification\. The taxonomy comprises four categories and eight subclasses, including harassment, discrimination, sexual content, and profanity\. Our approach achieves an F1 score of 94\.5% on the TextDetox dataset and demonstrates human\-model agreement comparable to inter\-human agreement\. Our analysis reveals that 2\.4% of all messages are classified as toxic, with notable differences across genres: streams of MOBA games exhibit the highest relative rate of toxicity \(3\.2%\), and sports games show the lowest rate \(2%\)\. Furthermore, results indicate that individual games differ significantly in their toxicity distributions, even within genres, suggesting the existence of game\-specific community norms and mechanics that shape toxic behavior beyond genre\-level effects\. These findings offer empirical insights into genre\- and game\-specific toxicity patterns on Twitch and can inform more targeted moderation strategies for gaming communities\. ††publicationid:pubid:979\-8\-3315\-9476\-3/26/$31\.00 ©2026 IEEE## IIntroduction Online gaming spaces have become important social environments where players and viewers interact, form communities, and establish shared norms\. However, these environments are often shaped by harmful behavior, including harassment, abusive language, and discrimination\. Such behavior can negatively affect player well\-being, discourage participation, and undermine the long\-term health of gaming communities\[[12](https://arxiv.org/html/2605.24000#bib.bib24),[14](https://arxiv.org/html/2605.24000#bib.bib23),[11](https://arxiv.org/html/2605.24000#bib.bib27),[24](https://arxiv.org/html/2605.24000#bib.bib38)\]\. Although widely recognized by developers, researchers, and players, the problem remains persistent and difficult to address\[[27](https://arxiv.org/html/2605.24000#bib.bib17)\]\. One reason is that harmful behavior takes many forms, and perceptions of toxicity are subjective and context\-dependent\[[13](https://arxiv.org/html/2605.24000#bib.bib31)\]\. Most current interventions respond only after harmful interactions have already occurred\[[27](https://arxiv.org/html/2605.24000#bib.bib17)\]\. Yet such experiences can cause players to withdraw from games or gaming communities entirely\[[14](https://arxiv.org/html/2605.24000#bib.bib23),[11](https://arxiv.org/html/2605.24000#bib.bib27)\], and may even discourage new players from engaging with games in the first place\[[25](https://arxiv.org/html/2605.24000#bib.bib16)\]\. Game\-external communities play an important role in gaming by providing social spaces that motivate players and can positively influence their mental well\-being\[[24](https://arxiv.org/html/2605.24000#bib.bib38)\]\. Twitch111[https://www\.twitch\.tv/](https://www.twitch.tv/); Last accessed:is a popular streaming platform where players broadcast gameplay to live audiences, creating shared social environments in which viewers interact through real\-time chat\. While this fosters community engagement, it can also enable toxic exchanges\. Prior work has identified Twitch as a platform in which toxic communication is prevalent\[[16](https://arxiv.org/html/2605.24000#bib.bib39)\]\. Unlike in\-game environments, where harmful interactions are often tied to player performance, toxic behavior on Twitch emerges more strongly from the social dynamics of chat\. Although Twitch offers automatic moderation tools, moderation practices largely remain the responsibility of streamers\. To support moderation and toxicity prevention, it is important to understand how toxic behavior manifests across different genres, individual games, and Twitch communities\. Identifying such patterns can provide insights into how toxicity varies across environments and help inform more targeted moderation strategies\. In this work, we analyze toxic behavior in Twitch text chats across different gaming communities, examining how chat interactions vary between game genres and how these differences appear in patterns of toxicity\. We compile a corpus of approximately 20 million chat messages of the two most popular games across seven selected Twitch categories and classify them into Twitch\-defined toxicity types using zero\-shot classification with a pre\-trained Large Language Model \(LLM\)\. This work provides a reproducible pipeline for toxicity detection and classification of text messages, along with an analysis of toxicity in Twitch chats across multiple popular genres and streamers\. Our findings offer insights into harmful dynamics within Twitch communities and may support improved toxicity detection and prevention systems\. The remainder of this paper is structured as follows: Section[II](https://arxiv.org/html/2605.24000#S2)reviews related work on toxicity detection, including studies using LLMs and research on Twitch\. Section[III](https://arxiv.org/html/2605.24000#S3)describes the methodology, and Section[IV](https://arxiv.org/html/2605.24000#S4)presents the results\. Section[V](https://arxiv.org/html/2605.24000#S5)discusses the findings and limitations, followed by the conclusion in Section[VII](https://arxiv.org/html/2605.24000#S7)\. ## IIBackground and Related Work ### II\-ADefining Toxicity Prior work highlights that toxicity, hate speech, and abusive behavior are inherently context\-dependent and partly subjective, which leads to varying definitions and annotation schemes across studies\[[6](https://arxiv.org/html/2605.24000#bib.bib30),[11](https://arxiv.org/html/2605.24000#bib.bib27),[16](https://arxiv.org/html/2605.24000#bib.bib39)\]\. Twitch itself defines four categories of inappropriate or harmful messages with eight subclasses in their automatic moderation \(cf\. Table[I](https://arxiv.org/html/2605.24000#S2.T1)\)\.222[https://help\.twitch\.tv/s/article/how\-to\-use\-automod](https://help.twitch.tv/s/article/how-to-use-automod); Last accessed:As this work focuses on data from Twitch, we use this categorization and definition for our evaluation\. TABLE I:Categories and Subclasses of Inappropriate or Harmful Messages as given by Twitch ### II\-BToxicity Detection with Large Language Models As widely accessible tools for data processing, LLMs have been applied to a variety of text\-based datasets, including those used for toxicity detection\[[3](https://arxiv.org/html/2605.24000#bib.bib18),[8](https://arxiv.org/html/2605.24000#bib.bib8),[9](https://arxiv.org/html/2605.24000#bib.bib7)\]\. Some studies aim to improve toxicity detection through approaches such as knowledge graphs\[[31](https://arxiv.org/html/2605.24000#bib.bib12)\], distilled models\[[30](https://arxiv.org/html/2605.24000#bib.bib11)\], or models specifically fine\-tuned for toxicity\-related tasks\[[9](https://arxiv.org/html/2605.24000#bib.bib7)\]\. Other work has evaluated the accuracy of LLMs in detecting toxic content\[[22](https://arxiv.org/html/2605.24000#bib.bib10),[21](https://arxiv.org/html/2605.24000#bib.bib13)\]\. In addition, LLMs have been proposed as automated moderation tools in gaming contexts\[[29](https://arxiv.org/html/2605.24000#bib.bib9)\]\. Overall, prior research suggests that LLMs achieve promising performance in toxicity detection\[[18](https://arxiv.org/html/2605.24000#bib.bib3),[26](https://arxiv.org/html/2605.24000#bib.bib4)\]\. ### II\-CToxicity Detection and Moderation on Twitch Twitch tackles the problem of toxicity with two methods, automatic \(AI\-based\) and manual moderation\. Gandolfi and Ferdig\[[16](https://arxiv.org/html/2605.24000#bib.bib39)\]emphasize the role of Twitch as a bearer of toxicity with few opportunities for contradiction\. They furthermore analyze data from Dota 2 streams on Twitch\. Kim et al\.\[[20](https://arxiv.org/html/2605.24000#bib.bib26)\]show that Twitch users often bypass automated moderation by replacing text with emotes, which humans can interpret but systems struggle to detect\. They compiled a dataset of Twitch emotes and developed a visual classifier, finding that approximately 3\.82% of Twitch chat messages are toxic\. Huth et al\.\[[19](https://arxiv.org/html/2605.24000#bib.bib25)\]present a real\-time toxicity detection plugin for Twitch using the Google Perspective API\.333[https://www\.perspectiveapi\.com/](https://www.perspectiveapi.com/); Last accessed:They highlight moderation challenges, particularly for smaller streamers\. Although designed for Twitch chat, their pipeline was evaluated using the Jigsaw Toxic Comment Classification dataset\.444[https://www\.kaggle\.com/c/jigsaw\-toxic\-comment\-classification\-challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge); Last accessed: Dreier and Pirker\[[10](https://arxiv.org/html/2605.24000#bib.bib15)\]analyze toxicity across 36 Twitch channels and 100,000 messages, considering stream type, streamer gender, community size, and game genre\. They find higher toxicity in multiplayer streams than in single\-player ones and slightly more in shooter games\. Smaller streams often lack moderation resources, while larger audiences are associated with more hate messages toward both viewers and streamers\. ## IIIMethodology ### III\-AResearch Questions This paper analyzes toxic behavior in Twitch chat with a focus on community\-level differences\. We aim to understand how different communities behave while watching popular streams and to identify potential differences between them\. To address our research objective using appropriate statistical measures and ensure transparency, we apply the Goal–Question–Metric \(GQM\) as proposed by Wohlin et al\.\[[28](https://arxiv.org/html/2605.24000#bib.bib22)\]and formulate the following research questions: Figure 1:Our pipeline for toxicity detection follows four sequential steps: data crawling, data preprocessing, LLM\-based labeling, and data analysis\.1. RQ1What forms of toxic behavior are most commonly exhibited by viewers on Twitch? 2. RQ2How does viewer toxicity on Twitch vary between games? 3. RQ3How does viewer toxicity on Twitch vary across genres? To address these questions, we use three sets of metrics: 1. M1\.1Ratio of toxic messages, overall and with regard to toxicity categories and subclasses\. 2. M1\.2Frequency of co\-occurrences of toxicity subclasses\. 3. M2\.1Ratio of toxic messages in games, overall and within toxicity categories and subclasses\. 4. M2\.2Comparison of distributions of toxicity subclasses between games\. 5. M3\.1Ratio of toxic messages in genres, overall and within toxicity categories and subclasses\. 6. M3\.2Comparison of distributions of toxicity subclasses between genres\. ### III\-BToxicity Category Selection For the classification of toxic chat messages, we adopt the toxicity taxonomy used by Twitch \(cf\. Table[I](https://arxiv.org/html/2605.24000#S2.T1)\)\. We use this taxonomy for two reasons: \(1\) there is no single universally accepted categorization of toxicity in the literature, and \(2\) all messages analyzed in this work originate from Twitch chat\. We rely on the platform\-specific moderation framework that governs this communication space\. By using Twitch’s moderation taxonomy, we ensure that our classification scheme is closely aligned with the context in which the messages were produced\. ### III\-CTwitch Game Categories and Genre Selection To create our data set, we selected two games from several Twitch categories with high viewer representation on the platform\. Table[II](https://arxiv.org/html/2605.24000#S3.T2)shows the selected categories and corresponding games based on their popularity on Twitch as of 14 November 2025\. Because genre boundaries in games are often blurred, we use Twitch’s category system as a practical basis for selection\. At the same time, Twitch categories do not always group games with comparable gameplay or genre characteristics\. As a result, Twitch’s category labels may result in substantial internal variation among games covered by the same label\. We alter the given categorization such that categories only contain games with comparable dynamics\. First, we separateStrategyintoMultiplayer \(MP\) StrategyandSingle\-Player \(SP\) Strategy, with the same reasoning as the existing separation ofMultiplayer ShootersandSingle\-Player Shooters: although both fall under a broader genre umbrella, they differ in gameplay structure, pacing, and player interaction, all of which may also shape chat behavior\. All selected games contribute to the analyses forRQ1andRQ2, but only a subset of categories is used for the genre\-focused analysis inRQ3\. As the strategy\-related categories and the MMO category are less internally homogeneous, we exclude them from the genre\-focused comparison\. For the remainder of this paper, the termgenreis only used when referring to these four selected categories, whilegame categorydenotes Twitch game categories more generally\. TABLE II:Our selection of the two most popular games per category on Twitch\. ### III\-DData Set Generation and Preprocessing Using data from SullyGnome,555[https://sullygnome\.com/](https://sullygnome.com/); Last accessed:a website that displays Twitch statistics and analysis, we selected the 50 most\-viewed streamers per game\. For each of these streamers, we downloaded the chats of their ten most recent Videos\-on\-Demand \(VODs\) using the TwitchDownloaderCLI\.666Provided by:[https://github\.com/lay295/TwitchDownloader](https://github.com/lay295/TwitchDownloader), Last accessed:This results in a dataset with a total of 4,452 streams with 20,212,682 messages and 29,001 hours of stream time from the time span of 19 June 2024 to 14 November 2025\. Labeling all messages required approximately 306 hours of compute time on four A6000 GPUs\. From the approximately 20 million chat messages, 34\.7% consist only of one word\. To reduce computational effort for the classification of each message, we pre\-label objectively non\-toxic messages such as ”hi”, ”yes”, or ”gg” from the set of the 50 most occurring messages\. A full list of pre\-labeled messages is available in our supplementary material\[[15](https://arxiv.org/html/2605.24000#bib.bib41)\]\. Twitch allows streamers to manually set up bots that automatically give predefined information to viewers\. We pre\-label messages from known bots, such as ”Nightbot” or ”StreamElements”, accordingly\. TABLE III:Exemplary results for each toxicity category \(see Table[I](https://arxiv.org/html/2605.24000#S2.T1)\)\. Two messages are shown per category for clarity\. ### III\-EZero Shot Classification: Toxic vs\. Non\-toxic To automatically label whether a chat message is toxic, we use a zero\-shot classification approach with an LLM and an instruction prompt\. Because the interpretation of toxicity often depends on conversational context, we provide the model with a temporal context window of the preceding ten seconds of chat messages to support more informed judgments\. After preliminary experiments, we selected Phi4\[[1](https://arxiv.org/html/2605.24000#bib.bib1)\], a 14B\-parameter model, for this task\. Phi4 is suitable for toxicity detection because, unlike models such as Llama3\[[17](https://arxiv.org/html/2605.24000#bib.bib2)\], it is less optimized to refuse or filter abusive content\. Its 16k token context window enables nuanced language understanding while maintaining relatively fast inference times\. Larger models may offer slight accuracy improvements but require substantially greater computational resources\. In the first step, we prompt the LLM as a binary toxicity classifier using the label definitions from Table[I](https://arxiv.org/html/2605.24000#S2.T1)\. Next, messages classified as toxic are labeled using a modified prompt based on the same toxicity definitions\. Here, the model assigns each message to a specific toxicity class\. We provide both prompts in the supplementary material\[[15](https://arxiv.org/html/2605.24000#bib.bib41)\]\. ## IVResults Of our dataset of 20,212,682 messages, 14\.4% were pre\-labeled as not toxic by default \(cf\. Section[III\-D](https://arxiv.org/html/2605.24000#S3.SS4)\) and 2\.4% of all messages were labeled as toxic by our system, approximately matching results from previous studies\[[20](https://arxiv.org/html/2605.24000#bib.bib26)\]\. Table[III](https://arxiv.org/html/2605.24000#S3.T3)presents exemplary messages classified as toxic, with two examples shown per toxicity category and subclass\. During the binary labeling process, a very small proportion \(0\.06%\) was incorrectly labeled by the LLM, producing outputs that did not conform to the binary yes/no label definition\. Due to their small proportion, these cases are excluded from subsequent analyses\. Before analyzing and discussing the results in more detail, we evaluate the labeling quality through a small\-scale agreement study and a comparison with an existing toxicity dataset containing known labels\. ### IV\-AHuman\-Model Agreement Evaluation Human\-model agreement was tested on 100 messages \(50 toxic, 50 non\-toxic\) with ten seconds context; three game researchers \(2M, 1F\) independently labeled for toxicity, blinded\. Agreement was evaluated using Cohen’s Kappa\[[7](https://arxiv.org/html/2605.24000#bib.bib5)\]\(\-1 = disagreement, 1 = perfect agreement\), achieving an average Kappa of 0\.53 \(0\.44, 0\.46, 0\.70\), which indicates moderate agreement\[[23](https://arxiv.org/html/2605.24000#bib.bib6)\]\. Human raters reached a similar score of 0\.55 \(0\.48, 0\.51, 0\.66\), suggesting comparable human–model and inter\-human agreement\. This indicates the labeling is sufficiently reliable for further analysis\. Human–model agreement for the eight toxicity subclasses was evaluated using the same sample of toxic messages\. Human raters assigned subclass labels, resulting in an average Cohen’s Kappa of 0\.42 \(0\.24, 0\.33, 0\.70\) and inter\-human agreement of 0\.32 \(0\.18, 0\.26, 0\.52\)\. Although lower, these values still indicate moderate agreement\. A large proportion of labeling disagreements \(30\.6% of all disagreements\) occurred when the LLM assignedbullyingwhile human raters assignedprofanity\. Another frequent mismatch was betweenaggressionassigned by the LLM andbullyingassigned by human raters \(16\.1%\)\. ### IV\-BComparison to a fine\-tuned BERT\-based approach We further compare our zero\-shot toxicity detection approach with the fine\-tuned RoBERTa model proposed by Dementieva et al\.\[[9](https://arxiv.org/html/2605.24000#bib.bib7)\]\. Applying our method to the English subset of the TextDetox dataset \(5,000 labeled messages\)777[https://huggingface\.co/textdetox/xlmr\-large\-toxicity\-classifier\-v2](https://huggingface.co/textdetox/xlmr-large-toxicity-classifier-v2); Last accessed:yields an F1 score of 94\.5%, slightly exceeding the 92\.3% reported by Dementieva et al\.\[[9](https://arxiv.org/html/2605.24000#bib.bib7)\]\. Although the dataset contains general language without conversational context, this result indicates that the approach generalizes well beyond the Twitch domain\. ### IV\-CRQ1 Results: Commonly exhibited toxic behavior on Twitch Overall, 2\.39% \(472,891\) of messages were labeled as toxic, while of those, 95\.8% have both a primary and a secondary label assigned\. Results show that the most toxic messages were labeled asharassment\(primary label/secondary label: 75\.6% / 12\.7%\), followed bydiscrimination\(12\.0% / 9\.5%\),profanity\(10\.0% / 53\.0%\), andsexual content\(2\.4% / 4\.2%\)\. The most common subclasses within toxic messages werebullying\(primary label/secondary label: 61\.0% / 11\.0%\) andaggression\(14\.6% / 1\.8%\), followed byprofanity\(10\.0% / 53\.0%\)\. These are followed by discrimination based onrace, ethnicity, or religion\(5\.3% / 1\.9%\),sexuality or gender\(3\.8% / 2\.5%\),sexual content\(2\.4% / 4\.2%\),misogyny\(2\.0% / 3\.3%\), and discrimination based ondisability\(0\.9% / 1\.9%\)\. The most frequent co\-occurrences between primary and secondary labels areharassmentwithprofanity\(54\.0% of toxic messages\), followed bydiscriminationwithharassment\(10\.8%\) andharassmentwithsexual content\(3\.0%\)\. The most common co\-occurrences for subclasses werebullyingwithprofanity\(43\.0%\),aggressionwithprofanity\(10\.9%\), and discrimination based onrace, ethnicity, or religionwithbullying\(3\.2%\)\. Aggregating subclasses into the broader categories shown in Table[I](https://arxiv.org/html/2605.24000#S2.T1)reveals that 85\.6% of toxic messages containharassment, 63\.0%profanity, 18\.0%discrimination, and 6\.6%sexual content\. Streams were further grouped into high\- and low\-toxicity based on their mean toxicity rate\. After testing for variance differences with Levene’s test and applying ANOVA or Welch’s t\-test where appropriate, significant differences emerged for three out of seven subclasses:sexual content\(F=44\.12F=44\.12,p<0\.001p<0\.001\), discrimination based onsexuality or gender\(t=14\.56t=14\.56,p<0\.001p<0\.001\), andaggression\(t=18\.66t=18\.66,p<0\.001p<0\.001\), with only the latter occurring more frequently in low\-toxicity streams\. ### IV\-DRQ2 Results: Variations of viewer toxicity between games The most prevalent game was Counter\-Strike 2, making up 28% of all messages in the dataset, followed by League of Legends \(15\.5%\), Valorant \(12\.9%\), and Path of Exile \(11\.5%\)\. Minecraft held the fewest messages \(0\.07%\)\. The relative amount of toxic messages per game is displayed in Table[IV](https://arxiv.org/html/2605.24000#S4.T4), with Red Dead Redemption II being the most toxic \(4\.0%\) and Minecraft being the least toxic \(0\.7%\)\. The distribution of toxicity categories as primary label per game is displayed in Figure[2](https://arxiv.org/html/2605.24000#S4.F2)\. Sinceharassmentanddiscriminationinclude multiple subclasses, we introduced a binary indicator marking whether any subclass of these categories appeared in a message\. Across all games,harassmentwas by far the most common primary toxicity category \(67\.8%\-89%\)\.Discriminationranked second \(5\.0%\-16\.4%\), withsexual content\(2\.0%\-4\.2%\) ranking last, except in EA Sports FC 26 whereprofanityranked second \(10\.0%\) anddiscriminationthird \(9\.0%\)\. For secondary labels, the order was consistent across all games:profanity\(56\.6%\-79%\) was by far the most prevalent,harassment\(10\.5%\-19\.5%\),discrimination\(8\.1%\-15\.2%\), andsexual content\(2\.3%\-9\.6%\) followed after\. At the subclass level,bullying\(9\.3%\-16\.9%\),aggression\(1\.2%\-2\.9%\), andprofanity\(4\.0%\-11\.6%\) were the most common primary labels\. The least frequent subclass was typically discrimination based ondisability\(0\.4%\-1\.2%\), except in Minecraft \(1%\) and League of Legends \(1\.8%\), where discrimination based onsexuality or genderandmisogyny, respectively, occurred least often\. For secondary labels,profanity\(56\.6%\-79\.1%\) was again most prevalent, followed bybullying\(9\.3%\-16\.9%\)\. We compared per\-stream distributions of toxicity subclasses in primary labels using PERMANOVA\[[5](https://arxiv.org/html/2605.24000#bib.bib36)\]\. Pairwise comparisons showed significant differences between most games, except for Counter\-Strike 2 and Valorant, and Plants vs\. Zombies and Trackmania\. For the latter pair, PERMDISP\[[4](https://arxiv.org/html/2605.24000#bib.bib37)\]indicated unequal dispersion, suggesting the PERMANOVA result may reflect differences in variability rather than distribution centers\.   Figure 2:Distribution of toxicity subclasses per game\.TABLE IV:Relative amount of toxic chat messages per game\. ### IV\-ERQ3 Results: Variations of viewer toxicity between genres Multiplayer Shooter streams accounted for the largest share of messages \(40\.9%\), followed by MOBA \(20\.4%\), Sports Games \(12\.9%\), and Single\-Player Shooter \(1\.5%\), while 24\.3% of messages came from games without an assigned genre\. Toxicity rates varied by genre, with the highest proportion in MOBA streams \(3\.2%\), followed by Single\-Player Shooter \(3\.1%\), and lower rates in Multiplayer Shooter \(2\.1%\) and Sports Games \(2\.0%\)\. Table[V](https://arxiv.org/html/2605.24000#S4.T5)shows the prevalence of primary and secondary toxicity labels among previously identified toxic messages by genre\. Across all four genres,harassmentis the most frequent primary label category \(74\.0%\-77\.6%\)\.Profanity, in contrast, is comparatively rare as a primary label \(8\.8%\-10\.2%\) but is the most frequent secondary label across all genres \(50\.5%\-55\.0%\)\. At the subclass level,bullyingis the most common primary harassment label in all genres \(39\.9%\-63\.7%\), whileaggressionis especially pronounced in Single Player Shooter streams \(34\.2%\)\. Within discrimination\-related categories, Single Player Shooter also shows the highest prevalence ofmisogynyin both the primary \(4\.4%\) and secondary labels \(4\.3%\)\. Overall, MOBA and Multiplayer Shooter exhibit very similar distributions across most primary and secondary categories, whereas Single Player Shooter differs more clearly through its elevated shares ofaggression,misogyny, andsexual content\. We compared per\-stream distributions of toxicity subclasses in primary labels, using PERMANOVA\[[5](https://arxiv.org/html/2605.24000#bib.bib36)\]\. Pairwise comparisons showed significant differences between all genre pairs\. However, PERMDISP\[[4](https://arxiv.org/html/2605.24000#bib.bib37)\]revealed unequal dispersion across all pairs, suggesting the PERMANOVA results may reflect differences in variability rather than distribution centers\. TABLE V:Primary and secondary toxicity label prevalence \(%\) among identified toxic messages by genre\. Overall rows report the combined prevalence for each top\-level category\. Bold indicates the highest prevalence per primary and secondary level within each row\. ## VDiscussion Our results show that the applied pipeline can successfully distinguish between toxic and non\-toxic messages\. At the same time, the analyzed data only includes messages that remained visible in chat and were not removed by automatic or manual moderation\. The observed prevalence of toxicity therefore already points to limitations of the existing moderation system, as harmful content is still present despite these filtering mechanisms\. Our proposed method achieves a level of agreement with human labeling, which is comparable to inter\-human agreement and suggests that although toxicity definitions are subjective, we achieve similar reliability as human labelers\. Since chat messages are expected to be moderated, the comparatively high prevalence of toxicity in the dataset suggests a need for improved moderation mechanisms\. Twitch’s taxonomy is useful, but some subclasses are difficult to distinguish in practice\. In particular, the distinction between harassment and profanity is often ambiguous, which is reflected in our agreement study\. This suggests that the observed disagreement may not solely result from model error, but may instead point to limitations of the taxonomy itself\. Clearer category definitions could reduce annotation ambiguity and improve classification consistency, likely offering more practical value than subclass distinctions that cannot be applied reliably\. ### V\-ARQ1: Commonly exhibited toxic behavior on Twitch Overall, the results indicate that while only a small part of messages on Twitch are toxic, most of them consist of insults and antagonizing language rather than explicit discriminatory or sexual content\. Harassment in the form of bullying or aggression makes up over half of the analyzed toxic messages, oftentimes accompanied by profanity, which commonly serves to intensify the tone of a message\. This goes in line with prior research\[[2](https://arxiv.org/html/2605.24000#bib.bib40)\]\. However, discriminatory messages still account for over 10% of toxic messages, indicating that discrimination and hate speech remain persistent issues in online gaming communities\. Although such messages occur less frequently, they are often perceived as particularly severe by players and can lead to disengagement from gaming spaces as well as negative impacts on mental well\-being\[[14](https://arxiv.org/html/2605.24000#bib.bib23),[11](https://arxiv.org/html/2605.24000#bib.bib27)\]\. Since game\-external communities serve as important spaces for player interaction and support, addressing toxicity on widely used platforms such as Twitch remains an important challenge\[[24](https://arxiv.org/html/2605.24000#bib.bib38)\]\. Finding 1: Harassment is the most prevalent form of toxicity in Twitch messages and frequently co\-occurs with profanity\. Profanity most often appears as a secondary label, indicating that it is commonly used to intensify otherwise toxic language\. ### V\-BRQ2: Variations of viewer toxicity between games The analysis of toxicity ratios in specific games show that games differ not only in their overall toxicity rates but also in the composition of toxic behavior\. This means that toxicity is not simply a platform\-wide phenomenon but is shaped by the communities surrounding individual games\. A central result of our analysis is that almost all pairwise comparisons between games reveal significant differences in toxicity subclass distributions\. This points toward the existence of game\-specific community norms\. Even when games are grouped under the same broader category, their chats do not necessarily develop the same toxicity profile\. The clearest example is that some game pairs within a broader category remain distinct, whereas Counter\-Strike 2 and Valorant appear similar\. This similarity is plausible because both games share closely related competitive structures, audience expectations, and viewing practices, which may encourage similar forms of chat interaction\. The observed variation between games suggests that game mechanics, streamers, their communities, and the governing moderation process likely interact in shaping toxic behavior\. Competitive games may foster more direct confrontation and blame, while games with different pacing, audience cultures, or forms of viewer engagement may produce different patterns of toxicity\[[10](https://arxiv.org/html/2605.24000#bib.bib15)\]\. For moderation, this implies that game\- or community\-specific adaptation may be more effective than one\-size\-fits\-all approaches\. If different games attract different mixtures of harassment, discriminatory language, sexualized remarks, and profanity, then moderation systems and community guidelines should be calibrated to the risks that are most characteristic of each game community\. Finding 2: Harassment is the most prevalent form of toxicity across all games, often accompanied by profanity\. While overall toxicity levels are similar between games, the distribution of toxicity categories differs significantly, except between Valorant and Counter\-Strike 2\. ### V\-CRQ3: Variations of viewer toxicity between genres Our results indicate that genre\-related differences affect not only the occurrence of toxicity, but also which forms of toxicity are most prominent within each genre\. The ratios of toxic messages in genres show meaningful differences in overall toxicity rates, with MOBA streams displaying the highest relative toxicity and Sports Games the lowest among the analyzed genres\. Comparing the distributions of toxicity subclasses shows that MOBA and Multiplayer Shooter chats appear particularly similar, whereas Single\-Player Shooter stands out through comparatively higher shares ofaggression,misogyny, andsexual content\. This suggests that genre \(or the specific set of games selected per genre\) can shape the prominence of certain toxic behaviors, but not necessarily the overall distribution of toxicity\. At the same time, the statistical results should be interpreted with caution\. Although PERMANOVA indicates significant pairwise differences between all genre pairs, the corresponding PERMDISP results show unequal dispersion\. This means that the observed genre differences may partly reflect different levels of within\-genre variability rather than a clean separation between genre means\. Overall, this fits the broader pattern of the results: genre matters, but genre alone is not sufficient to explain the prevalence of toxicity categories\. Considerable variation remains within most genres, although Multiplayer Shooters appear to be a partial exception, with comparatively similar profiles across games\. This is consistent with the stronger role of individual games and communities observed inRQ2\. Taken together, the results suggest that genre is a useful but limited analytical lens\. It captures broad tendencies in toxicity prevalence and emphasis, but it does not fully account for the more fine\-grained variation that emerges at the level of individual games and their surrounding communities\. Finding 3: Genre influences both the overall level and the relative emphasis of toxic behavior on Twitch\. Toxicity across genres shares a common underlying structure centered on harassment and profanity, while much of the finer\-grained variation appears to be driven by individual games and their communities rather than genre alone\. ## VILimitations We discuss limitations of our work in alignment with the guidelines of Wohlin et al\.\[[28](https://arxiv.org/html/2605.24000#bib.bib22)\]\. #### Construct Validity Toxicity is inherently subjective and context\-dependent, and interpretations may vary across individuals and communities\. Our labeling follows Twitch’s moderation taxonomy, though messages may still be interpreted differently depending on context or community norms\. While human–model agreement is moderate, some false positives and negatives remain, but these rarely occur and are unlikely to affect aggregate patterns in the large\-scale dataset\. #### Internal Validity The dataset primarily contains messages from popular streams, which may introduce selection bias\. Communication dynamics and moderation practices can differ between large and smaller communities\[[10](https://arxiv.org/html/2605.24000#bib.bib15)\]\. In addition, Twitch employs automated moderation tools that filter messages belonging to the categories described in Section[III\-B](https://arxiv.org/html/2605.24000#S3.SS2), meaning the dataset includes only messages that were not previously removed by the platform\. Manual moderation by streamers and their moderation teams may further influence which messages remain visible in chat\. As a result, the dataset may underestimate the absolute prevalence of toxic behavior, and observed differences in toxicity levels may partly reflect moderation practices rather than community behavior\. However, the large scale of the dataset, spanning hundreds of streams and communities, reduces the likelihood that these factors systematically bias the overall results and still enables meaningful comparisons of relative toxicity patterns across games and genres\. To reduce confounding effects from differing gameplay mechanics and community structures, genre comparisons are limited to manually selected structurally similar games \(e\.g\., Valorant and Counter\-Strike 2\), ensuring that comparisons remain meaningful\. Finally, non\-English messages may be harder for the model to interpret and could not be evaluated by the human raters, potentially affecting reliability in multilingual contexts\. However, due to our focus on English\-speaking streamers, the vast majority of messages in the dataset are written in English, which limits the impact of this issue on the overall analysis\. #### Conclusion Validity Toxicity labeling was performed using a language model rather than a task\-specific classifier, which may introduce classification inconsistencies\. To ensure consistency, we provide the model validation in Section[IV\-A](https://arxiv.org/html/2605.24000#S4.SS1)\. #### External Validity Our findings are based on Twitch chat data and may therefore not fully generalize to other gaming communities or online platforms\. Communication on Twitch differs from many social media platforms due to its fast\-paced chat and features such as emotes\. However, as one of the largest gaming\-related social platforms, it provides a valuable environment for studying toxicity in large gaming communities\. Our comparable results on the TextDetox dataset888[https://huggingface\.co/textdetox/xlmr\-large\-toxicity\-classifier\-v2](https://huggingface.co/textdetox/xlmr-large-toxicity-classifier-v2); Last accessed:further suggest a degree of generalizability\. ## VIIConclusion This work examines toxicity in Twitch chats across games and genres using a large\-scale dataset of more than 20 million messages and a zero\-shot LLM\-based classification pipeline\. The proposed pipeline reliably distinguishes toxic from non\-toxic messages, produces classifications comparable to human judgments, and performs well against the benchmark dataset, where it achieved strong results in comparison to the fine\-tuned baseline model\. However, our findings also indicate that some limitations stem not only from the model, but from ambiguities in Twitch’s underlying toxicity taxonomy itself\. Our results show that harassment is the dominant form of toxicity on Twitch and frequently co\-occurs with profanity, while discriminatory and sexualized content occur less often but remain relevant\. We also find that toxicity varies not only in its overall prevalence, but also in its composition across games and genres\. In particular, most games exhibit distinct toxicity profiles, suggesting that toxic behavior is shaped by game\-specific contexts and community norms beyond broader genre\-level effects\. Future work could refine the toxicity taxonomy, incorporate Twitch\-specific features such as emotes, and apply user\-based clustering to identify more detailed toxicity patterns across communities\. Examining toxicity at the level of individual streams could provide a more fine\-grained understanding of Twitch communities on their most fundamental level, improving the robustness of the classification approach\. ## Acknowledgements We are grateful for the support of Bodo Rosenhahn, which made the preparation and dissemination of this work possible\. Further, we disclose the use of LLMs for improving readability\. ## References - \[1\]M\. Abdin, J\. Aneja, H\. Behl, S\. Bubeck, R\. Eldan, and S\. Gunasekar et al\.\(2024\-12\)Phi\-4 Technical Report\.\(en\)\.Note:arXiv\.orgCited by:[§III\-E](https://arxiv.org/html/2605.24000#S3.SS5.p2.1)\. - \[2\]J\. C\. Aguerri, M\. Santisteban, and F\. Miró\-Llinares\(2023\)The enemy hates best? toxicity in league of legends and its content moderation implications\.European Journal on Criminal Policy and Research29\(3\),pp\. 437–456\.Cited by:[§V\-A](https://arxiv.org/html/2605.24000#S5.SS1.p1.1)\. - \[3\]A\. Albladi, M\. Islam, A\. Das, M\. Bigonah, Z\. Zhang, F\. Jamshidi, M\. Rahgouy, N\. Raychawdhary, D\. Marghitu, and C\. Seals\(2025\)Hate speech detection using large language models: a comprehensive review\.IEEE Access13\(\),pp\. 20871–20892\.External Links:[Document](https://dx.doi.org/10.1109/ACCESS.2025.3532397)Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p1.1)\. - \[4\]M\. J\. Anderson\(2006\)Distance\-based tests for homogeneity of multivariate dispersions\.Biometrics62\(1\),pp\. 245–253\.Cited by:[§IV\-D](https://arxiv.org/html/2605.24000#S4.SS4.p3.1),[§IV\-E](https://arxiv.org/html/2605.24000#S4.SS5.p4.1)\. - \[5\]M\. J\. Anderson\(2014\)Permutational multivariate analysis of variance \(permanova\)\.Wiley statsref: statistics reference online,pp\. 1–15\.Cited by:[§IV\-D](https://arxiv.org/html/2605.24000#S4.SS4.p3.1),[§IV\-E](https://arxiv.org/html/2605.24000#S4.SS5.p4.1)\. - \[6\]B\. Barbarestani, I\. Maks, and P\. T\.J\.M\. Vossen\(2024\-05\)Content moderation in online platforms: a study of annotation methods for inappropriate language\.InProceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC\-COLING\-2024,pp\. 96–104\.Cited by:[§II\-A](https://arxiv.org/html/2605.24000#S2.SS1.p1.1)\. - \[7\]J\. Cohen\(1960\)A coefficient of agreement for nominal scales\.Educational and Psychological Measurement20\(1\),pp\. 37–46\.External Links:[Document](https://dx.doi.org/10.1177/001316446002000104)Cited by:[§IV\-A](https://arxiv.org/html/2605.24000#S4.SS1.p2.1)\. - \[8\]D\. Dementieva, D\. Moskovskiy, N\. Babakov, A\. A\. Ayele, N\. Rizwan, F\. Schneider, X\. Wang, S\. M\. Yimam, D\. Ustalov, E\. Stakovskii,et al\.\(2024\)Overview of the multilingual text detoxification task at pan 2024\.\.InCLEF \(Working Notes\),pp\. 2432–2461\.Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p1.1)\. - \[9\]D\. Dementieva, V\. Protasov, N\. Babakov, N\. Rizwan, I\. Alimova, and C\. Brune et al\.\(2025\-09\)Overview of the multilingual text detoxification task at pan 2025\.InWorking Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum,CEUR Workshop Proceedings\.Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p1.1),[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1),[§IV\-B](https://arxiv.org/html/2605.24000#S4.SS2.p1.1)\. - \[10\]L\. Dreier and J\. Pirker\(2023\)Toxicity in twitch live stream chats: towards understanding the impact of gender, size of community and game genre\.In2023 IEEE Conference on Games \(CoG\),pp\. 1–4\.Cited by:[§II\-C](https://arxiv.org/html/2605.24000#S2.SS3.p4.1),[§V\-B](https://arxiv.org/html/2605.24000#S5.SS2.p3.1),[§VI](https://arxiv.org/html/2605.24000#S6.SS0.SSS0.Px2.p1.1)\. - \[11\]J\. Fox and W\. Y\. Tang\(2017\)Women’s experiences with general and sexual harassment in online video games: Rumination, organizational responsiveness, withdrawal, and coping strategies\.New media & society19\(8\),pp\. 1290–1307\.Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p1.1),[§II\-A](https://arxiv.org/html/2605.24000#S2.SS1.p1.1),[§V\-A](https://arxiv.org/html/2605.24000#S5.SS1.p1.1)\. - \[12\]J\. Frommel, D\. Johnson, and R\. L\. Mandryk\(2023\)How perceived toxicity of gaming communities is associated with social capital, satisfaction of relatedness, and loneliness\.Computers in Human Behavior Reports10\.External Links:ISSN 2451\-9588,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.chbr.2023.100302)Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p1.1)\. - \[13\]J\. Frommel, R\. L\. Mandryk, and M\. Klarkowski\(2022\)Challenges to combating toxicity and harassment in multiplayer games: involving the hci games research community\.InExtended Abstracts of the 2022 Annual Symposium on Computer\-Human Interaction in Play,Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p1.1)\. - \[14\]R\. Fuchs, J\. Droste, and A\. Dockhorn\(2025\)How do players perceive gender discrimination? on the differences of harassment in online games\.In2025 IEEE Conference on Games \(CoG\),pp\. 1–8\.Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p1.1),[§V\-A](https://arxiv.org/html/2605.24000#S5.SS1.p1.1)\. - \[15\]Fuchs, Rupp, Bertram, Eckert, and Dockhorn\(2026\)Supplementary Material\.Note:available at:[https://github\.com/ronjafuchs/twitch\_toxicity\_analysis](https://github.com/ronjafuchs/twitch_toxicity_analysis)Cited by:[§III\-D](https://arxiv.org/html/2605.24000#S3.SS4.p2.1),[§III\-E](https://arxiv.org/html/2605.24000#S3.SS5.p3.1)\. - \[16\]E\. Gandolfi and R\. E\. Ferdig\(2022\)Sharing dark sides on game service platforms: disruptive behaviors and toxicity in dota2 through a platform lens\.Convergence28\(2\),pp\. 468–487\.Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p2.1),[§II\-A](https://arxiv.org/html/2605.24000#S2.SS1.p1.1),[§II\-C](https://arxiv.org/html/2605.24000#S2.SS3.p1.1)\. - \[17\]A\. Grattafiori, A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, and A\. Al\-Dahle et al\.\(2024\-07\)The Llama 3 Herd of Models\.\(en\)\.Note:arXiv\.orgCited by:[§III\-E](https://arxiv.org/html/2605.24000#S3.SS5.p2.1)\. - \[18\]S\. Gretz, A\. Halfon, I\. Shnayderman, O\. Toledo\-Ronen, A\. Spector, and L\. Dankin et al\.\(2023\)Zero\-shot Topical Text Classification with LLMs \- an Experimental Study\.InFindings of the Association for Computational Linguistics: EMNLP 2023,pp\. 9647–9676\(en\)\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.findings-emnlp.647)Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1)\. - \[19\]J\. Huth, C\. Eichhorn, D\. A\. Plecher, and J\. Pirker\(2025\)Exploring the potential of an ai\-based twitch moderation and toxicity detection bot\.In2025 IEEE Conference on Games \(CoG\),pp\. 1–4\.Cited by:[§II\-C](https://arxiv.org/html/2605.24000#S2.SS3.p3.1)\. - \[20\]J\. Kim, D\. Y\. Wohn, and M\. Cha\(2022\)Understanding and identifying the use of emotes in toxic chat on twitch\.Online Social Networks and Media27,pp\. 100180\.Cited by:[§II\-C](https://arxiv.org/html/2605.24000#S2.SS3.p2.1),[§IV](https://arxiv.org/html/2605.24000#S4.p1.1)\. - \[21\]H\. Koh, D\. Kim, M\. Lee, and K\. Jung\(2024\)Can llms recognize toxicity? a structured investigation framework and toxicity metric\.InFindings of the Association for Computational Linguistics: EMNLP 2024,pp\. 6092–6114\.Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1)\. - \[22\]U\. Kruschwitz and M\. Schmidhuber\(2024\)Llm\-based synthetic datasets: applications and limitations in toxicity detection\.InProceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying@ LREC\-COLING\-2024,pp\. 37–51\.Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1)\. - \[23\]J\. R\. Landis and G\. G\. Koch\(1977\)The measurement of observer agreement for categorical data\.Biometrics33\(1\),pp\. 159–174\.Cited by:[§IV\-A](https://arxiv.org/html/2605.24000#S4.SS1.p2.1)\. - \[24\]L\. E\. Ma, C\. Dickson\-Deane, W\. Raffe, A\. R\. Murphy, and J\. Garcia\(2024\)Gaming for equity: the power of diversity within gender and race in gamers\.In2024 IEEE Conference on Games \(CoG\),Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p1.1),[§V\-A](https://arxiv.org/html/2605.24000#S5.SS1.p1.1)\. - \[25\]M\. Märtens, S\. Shen, A\. Iosup, and F\. Kuipers\(2015\)Toxicity detection in multiplayer online games\.In2015 International Workshop on Network and Systems Support for Games \(NetGames\),pp\. 1–6\.Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p1.1)\. - \[26\]J\. Savelka and K\. D\. Ashley\(2023\-11\)The unreasonable effectiveness of large language models in zero\-shot semantic annotation of legal texts\.Frontiers in Artificial Intelligence6\(English\)\.External Links:ISSN 2624\-8212,[Document](https://dx.doi.org/10.3389/frai.2023.1279794)Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1)\. - \[27\]M\. Wijkstra, K\. Rogers, R\. L\. Mandryk, R\. C\. Veltkamp, and J\. Frommel\(2023\)Help, my game is toxic\! first insights from a systematic literature review on intervention systems for toxic behaviors in online video games\.InCompanion Proceedings of the Annual Symposium on Computer\-Human Interaction in Play,pp\. 3–9\.Cited by:[§I](https://arxiv.org/html/2605.24000#S1.p1.1)\. - \[28\]C\. Wohlinet al\.\(2012\)Experimentation in software engineering\.Vol\.236,Springer\.Cited by:[§III\-A](https://arxiv.org/html/2605.24000#S3.SS1.p1.1),[§VI](https://arxiv.org/html/2605.24000#S6.p1.1)\. - \[29\]Z\. Yang, D\. Tullo, and R\. Rabbany\(2025\)Unified game moderation: soft\-prompting and llm\-assisted label transfer for resource\-efficient toxicity detection\.InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\. 2,pp\. 5161–5170\.Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1)\. - \[30\]J\. Zhang, Q\. Wu, Y\. Xu, C\. Cao, Z\. Du, and K\. Psounis\(2024\)Efficient toxic content detection by bootstrapping and distilling large language models\.InProceedings of the AAAI conference on artificial intelligence,Vol\.38,pp\. 21779–21787\.Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1)\. - \[31\]Y\. Zhao, J\. Zhu, C\. Xu, Y\. Liu, and X\. Li\(2025\)Enhancing llm\-based hatred and toxicity detection with meta\-toxic knowledge graph\.InFindings of the Association for Computational Linguistics: ACL 2025,pp\. 24747–24760\.Cited by:[§II\-B](https://arxiv.org/html/2605.24000#S2.SS2.p2.1)\.
Similar Articles
PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
This paper presents a system for the EEUCA 2026 shared task on toxicity detection in gaming chat, achieving 4th place by fine-tuning Llama 3.1 8B with synthetic data augmentation. It highlights a 'validation trap' phenomenon where high validation scores do not correlate with test performance due to dataset distribution shifts.
Toxicity on Social Media – The Noisy Room
A Stanford study analyzing billions of social media posts reveals that only ~3% of users generate severely toxic content, but engagement-driven algorithms disproportionately amplify this minority, distorting public perception and driving self-censorship among the majority.
Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study
This replication study evaluates DExperts for mitigating toxicity in LLMs, finding near-perfect safety against explicit toxicity but reduced effectiveness against implicit hate speech and a significant latency trade-off.
Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits
This paper investigates how toxic lexical perturbations in prompts reduce the factual accuracy and increase uncertainty of LLMs, and uses attribution-graph analyses to trace internal changes. It finds that increasing toxicity amplifies perturbation-sensitive variant nodes while core reasoning nodes remain invariant.
State Contamination in Memory-Augmented LLM Agents
This paper identifies and studies 'memory laundering' in LLM agents, where toxic or adversarial context compressed into memory summaries evades standard toxicity detectors while still influencing future generations. It introduces the sub-threshold propagation gap (SPG) to measure hidden downstream influence and shows that sanitizing toxic state before summarization is more effective than post-hoc cleaning.