EquiSumm : A Gender Bias-Aware Framework for Inclusive Tweet Summarization

arXiv cs.CL Papers

Summary

Proposes EquiSumm, a gender bias-aware framework for inclusive tweet summarization that ensures representation of opinions from different gender groups, addressing demographic fairness in automated summarization.

arXiv:2605.23412v1 Announce Type: new Abstract: While social media platforms, such as Twitter, provide a medium for large-scale opinion sharing during news events, it is manually impossible for individuals or media agencies to process the vast volume of content to identify key viewpoints. In order to resolve this, several automatic summarization techniques have been proposed to condense large collections of tweets into concise and informative summaries. However, these algorithms do not explicitly consider demographic fairness. Several existing research works have developed automated summarization approaches that can provide a holistic overview of the key aspects and major opinions shared on social media platforms related to a news event. However, these approaches do not explicitly consider different forms of demographic representation, such as gender, which can lead to biased summary representation. In this paper, we propose EquiSumm, which considers the gender aspect of the shared opinion to generate a summary, and our experimental analysis on two major datasets indicates the performance effectiveness with respect to existing research works.
Original Article
View Cached Full Text

Cached at: 05/25/26, 09:02 AM

# EquiSumm : A Gender Bias-Aware Framework for Inclusive Tweet Summarization
Source: [https://arxiv.org/html/2605.23412](https://arxiv.org/html/2605.23412)
11institutetext:ABV IIITM Gwalior, India
11email:\{imt\_2022034, imt\_2022054, imt\_2022097, imt\_2022100, roshni\}@iiitm\.ac\.in###### Abstract

While social media platforms, such as Twitter, provide a medium for large\-scale opinion sharing during news events, it is manually impossible for individuals or media agencies to process the vast volume of content to identify key viewpoints\. In order to resolve this, several automatic summarization techniques have been proposed to condense large collections of tweets into concise and informative summaries\. However, these algorithms do not explicitly consider demographic fairness\. Several existing research works have developed automated summarization approaches that can provide a holistic overview of the key aspects and major opinions shared on social media platforms related to a news event\. However, these approaches do not explicitly consider different forms of demographic representation, such as gender, which can lead to biased summary representation\. In this paper, we proposeEquiSumm, which considers the gender aspect of the shared opinion to generate a summary, and our experimental analysis on two major datasets indicates the performance effectiveness with respect to existing research works\.

## 1Introduction

Although social media platforms, such as Twitter, have become the main sources of information for a large fraction of users, the availability of continuous information is overwhelming to identify and understand the aspects\. Therefore, automated summarization approaches can provide an overview of user opinions with respect to a news event through a small number of tweets\[[3](https://arxiv.org/html/2605.23412#bib.bib3)\]\. A comprehensive summary of user opinions can provide users an understanding of the larger debate centered and provide representation to the different viewpoints\[[8](https://arxiv.org/html/2605.23412#bib.bib8)\], such as position detection\[[4](https://arxiv.org/html/2605.23412#bib.bib4)\], popularity identification, and holistic understanding of the event\[[13](https://arxiv.org/html/2605.23412#bib.bib13),[11](https://arxiv.org/html/2605.23412#bib.bib11),[9](https://arxiv.org/html/2605.23412#bib.bib9)\]\. There are several challenges in tweet summarization, such as understanding of user viewpoints irrespective of the vocabulary gap, etc\[[2](https://arxiv.org/html/2605.23412#bib.bib2)\]\. While news event based tweet summarization focuses on understanding therelevanceandcoverageof summary tweets with respect to the news event\[[5](https://arxiv.org/html/2605.23412#bib.bib5)\], disaster tweet based summarization approaches mainly aim to identify the different subcategories to ensure faster disaster response\[[7](https://arxiv.org/html/2605.23412#bib.bib7)\]\. However, none of these approaches focus on social bias aware summarization, i\.e\. integration and representation of the different groups of individuals\.

Therefore, while existing summarization systems are widely used across news media platforms, social media applications provide users with an overview of what content is highlighted and how it is presented\. However, these approaches might not ensure the representation of all demographic groups\. This becomes specifically relevant when news events relate directly to demographic groups, such as,MeTooevent111https://en\.wikipedia\.org/wiki/MeToo\_movement\. Therefore, an appropriate summary should ensure both representation of the opinions and fair representation of both genders\.

Table 1:Tweets with respect toMeToo Dataset
## 2Proposed Methodology

The proposed framework comprises of two phases, which includes classification of a tweet to a particular gender group followed by selection of representative tweets from each group\. We discuss each of these steps in detail next\.

### Phase I \(Gender Classification\) :

This is the first step in the summarization method\. The main objective for this Phase is to identify the aspects related to the event that the specific gender discusses and further, would ensure representation for every gender in summarization\. Our Phase I comprises of classification of a tweet into one of the gender categories \(G​CGC\), such as,male\(MM\) ,female\(FF\),neutral\(NN\), orboth\(BB\), on the basis of the topic and information discussed in that tweet text\. We provide four examples to highlight the understanding of the different aspects to identify the gender that the tweet implicitly discusses in Table[1](https://arxiv.org/html/2605.23412#S1.T1)\. The proposed approach does not consider the gender of the user who tweeted rather proposes an approach to automatically infer the particular gender the tweet refers to or discusses about\. Therefore, understanding and categorizing tweets to different genders on the basis of theirimplicitreferences would help to ensure opinions and information representativeness across different genders\.

For Phase I, we propose a clustering based technique to identify the gender category given a tweet\. We initially classify a tweet to a particular gender on the basis of its constituent word matching with the existing gender ontology\. We rely on the word list provided by the MIND dataset with respect to male and female\-associated words\[[1](https://arxiv.org/html/2605.23412#bib.bib1)\]\. We follow spaCy\-NER222https://spacy\.io/to detect gendered mentions in tweets such as names typically associated with men or women or gendered pronouns\. Incorporating gender\-specific keywords drawn from the existing ontology with spaCy’s NER, we get tweets classified into gender categories\. This further includes understanding of gender specific names along with contextual information\. If the tweet contains words from both the male and female lists, it is consideredBoth\. If the number of words from the male list and female list is equal or zero, then the tweet is categorized asNeutral\. On the basis of our initial segregation, we generate the classified tweets with high confidence score into gender based clusters and compute the correspondingcentroid, which is the average vector of all tweets in that group on the basis of SBERT\[[12](https://arxiv.org/html/2605.23412#bib.bib12)\]similarity as shown in Equation[1](https://arxiv.org/html/2605.23412#S2.E1)

c→gender=1Ng​∑i=1Ngx→i\\displaystyle\\vec\{c\}\_\{\\text\{gender\}\}=\\frac\{1\}\{N\_\{g\}\}\\sum\_\{i=1\}^\{N\_\{g\}\}\\vec\{x\}\_\{i\}\(1\)Where,x→i\\vec\{x\}\_\{i\}is the SBERT vector of the tweetiilabeled any of the gender, andNgN\_\{g\}is the number of tweets in that gender group\. Furthermore, given an unclassified tweet or classified tweet with low confidence, we compare the vector representations of that tweet to the corresponding gender cluster centroids bycosine similarity333https://en\.wikipedia\.org/wiki/Cosine\_similarity\. An unclassified tweet is then assigned to the gender group to which it is closest in terms of cosine similarity\. Therefore, tweets are classified into gender categories likemale,female, andboth\. Our initial results show high effectiveness in gender classification as shown in Figures[1\(a\)](https://arxiv.org/html/2605.23412#S2.F1.sf1)and[1\(b\)](https://arxiv.org/html/2605.23412#S2.F1.sf2), respectively, for both the datasets\.

![Refer to caption](https://arxiv.org/html/2605.23412v1/x1.png)\(a\)MeToo Dataset
![Refer to caption](https://arxiv.org/html/2605.23412v1/x2.png)\(b\)Legalisation of Abortion Dataset

Figure 1:Percentage of tweets across gender categories obtained using spaCy NER and clustering\.
### Phase II \(Representative Tweets Selection\)

In the second phase, the goal is to identify the most informative and representative tweets from each gender\-associated cluster so that the final summary reflects both the key discussion themes and balanced gender perspectives\. To achieve this, we first construct a tweet similarity graph for each gender group\. In this graph, each tweet is represented as a node, and an edge is introduced between two nodes if the cosine similarity between their SBERT\-based vector embeddings exceeds a manually chosen threshold\. Based on feedback from human annotators and empirical inspection of semantic coherence, we set this threshold to 0\.40\. This ensures that edges reflect meaningful semantic similarity rather than superficial lexical overlap\.

Once the similarity graph is constructed, we apply the LexRank centrality algorithm to identify the most central tweets within each gender cluster\. LexRank ranks tweets based on their connectivity and importance within the similarity network, allowing us to select the tweets that best represent the overall viewpoint distribution within that gender group\. For each gender category, we extract the topKKhighest\-ranked tweets, whereKKis fixed to maintain consistency in the length of the generated summaries across gender groups\. This prevents the dominant gender group in the dataset from disproportionately influencing the final summary\.

Finally, the selected representative tweets from each gender category are concatenated to form the overall event summary\. By ensuring equal contribution from each gender\-associated cluster, the resulting summary not only captures the key themes and sentiments discussed in the event but also ensures balanced representation of gender perspectives\.

## 3Experimental Discussions and Results

Comparison with Baselines\.We compareEquiSummagainst widely\-used extractive summarization baselines to assess both summary quality and fairness\. The first baseline,LexRank\[[6](https://arxiv.org/html/2605.23412#bib.bib6)\], is a graph\-based centrality method that selects sentences based on their importance within a similarity graph\. The second baseline,Latent Semantic Analysis \(LSA\)\[[10](https://arxiv.org/html/2605.23412#bib.bib10)\], performs dimensionality reduction to uncover latent conceptual topics and selects representative sentences accordingly\. The third baseline,Community Detection \(Louvain\) \+ LexRank, first clusters tweets into semantically coherent communities and then applies LexRank within each cluster to extract representative sentences\. This method allows topic\-wise coverage but does not explicitly ensure demographic representation\. These baselines help us evaluate how conventional summarizers behave in the presence of gender\-skewed discourse\.

Dataset Details\.We conduct experiments on two publicly available social discussion datasets where gender\-based perspectives are strongly articulated\. The first dataset concerns the global\#MeToo Movementand contains 485 tweets reflecting personal experiences, opinions, and reactions to harassment\-related narratives\. The second dataset captures the debate over theLegalization of Abortion in the United States, consisting of 934 tweets discussing rights, ethics, policy decisions, and personal viewpoints\. Both datasets naturally contain diverse emotional tones, argument styles, and implicit references to gender, making them suitable for evaluating fairness in summarization\.

Inclusion Bias Score\.Since traditional summarization evaluation metrics such as ROUGE require reference summaries and do not account for demographic fairness, we adopt theInclusion Bias Score \(IBS\)to measure the extent of gender representation balance in the generated summaries\. IBS measures how frequently male\-associated terms appear relative to female\-associated terms in the final summary\. A score close to 0 indicates balanced representation, while a positive score reflects bias toward female\-associated content and a negative score indicates bias toward male\-associated content\. The metric is computed as follows:

IBS=∑f​r​e​q​\(f\)∑f​r​e​q​\(m\)\+∑f​r​e​q​\(f\)−∑f​r​e​q​\(m\)∑f​r​e​q​\(m\)\+∑f​r​e​q​\(f\)\\text\{IBS\}=\\frac\{\\sum freq\(f\)\}\{\\sum freq\(m\)\+\\sum freq\(f\)\}\-\\frac\{\\sum freq\(m\)\}\{\\sum freq\(m\)\+\\sum freq\(f\)\}
wherefreq​\(m\)\\text\{freq\}\(m\)andfreq​\(f\)\\text\{freq\}\(f\)denote the normalized frequencies of male\-associated and female\-associated terms, respectively, in the generated summary\. This provides a quantitative measure to examine whether the summarization method amplifies or reduces existing bias within the dataset\.

Table 2:Gender Representation Metrics – MeToo Dataset
## 4Conclusions and Future Works

Through this work, we propose a gender aware summarizer algorithm,EquiSummthat can ensure representation of gender based opinions irrespective of the user identity\. This is specifically relevant in current scenario when we prefer summarization algorithms to ensure fairness in summarization along with ageold objectives\. We empirically evaluate the performance of the proposed model on two gender specific datasets to understand its effectiveness and compare with existing summarization approaches\. Our preliminary results indicate high effectiveness ofEquiSumm\. However, this is a preliminary work that we intend to extend such that it can capture and represent non‑binary or intersectional identities and include more real life datasets\.

## References

- \[1\]Gender bias detection in summarization\.https://github\.com/samruddhikurhe/Gender\-Bias\(2024\), accessed: 2025\-07\-28
- \[2\]Bansal, D\., Saini, N\., Saha, S\.: Dcbrts: a classification\-summarization approach for evolving tweet streams in multiobjective optimization framework\. IEEE Access9, 148325–148338 \(2021\)
- \[3\]Chakraborty, R\., Bhavsar, M\., Dandapat, S\.K\., Chandra, J\.: Tweet summarization of news articles: An objective ordering\-based perspective\. IEEE Transactions on Computational Social Systems6\(4\), 761–777 \(2019\)
- \[4\]Chakraborty, R\., Bhavsar, M\., Dandapat, S\.K\., Chandra, J\.: Detecting stance in tweets: A signed network based approach\. arXiv preprint arXiv:2201\.07472 \(2022\)
- \[5\]Chakraborty, R\., Chakraborty, N\.: Twminer: Mining relevant tweets of news articles\. In: 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops \(CCGridW\)\. pp\. 1–3\. IEEE \(2023\)
- \[6\]Erkan, G\., Radev, D\.R\.: Lexrank: Graph\-based lexical centrality as salience in text summarization\. Journal of artificial intelligence research22, 457–479 \(2004\)
- \[7\]Garg, P\.K\., Chakraborty, R\., Dandapat, S\.K\.: Atsumm: Auxiliary information enhanced approach for abstractive disaster tweet summarization with sparse training data\. Knowledge\-Based Systems311, 112969 \(2025\)
- \[8\]Garg, P\.K\., Chakraborty, R\., Dandapat, S\.K\.: Portrait: a hybrid approach to create extractive ground\-truth summary for disaster event\. ACM Transactions on the Web19\(1\), 1–36 \(2025\)
- \[9\]Kumar, R\., Sinha, R\., Saha, S\., Jatowt, A\.: Extracting the full story: a multimodal approach and dataset to crisis summarization in tweets\. IEEE Transactions on Computational Social Systems \(2024\)
- \[10\]Landauer, T\.K\., Dumais, S\.T\.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge\. Psychological review104\(2\), 211 \(1997\)
- \[11\]Mahindrakar, S\.M\., Mondal, T\., Dhakne, A\., Arosh, S\., Bhattacharya, I\.: Performance analysis of tweet summarization techniques considering crisis dynamics\. In: Proceedings of the 25th International Conference on Distributed Computing and Networking\. pp\. 418–423 \(2024\)
- \[12\]Reimers, N\., Gurevych, I\.: Sentence\-bert: Sentence embeddings using siamese bert\-networks\. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\)\. p\. 3982\. Association for Computational Linguistics \(2019\)
- \[13\]Zhu, M\., Zeng, K\., Wang, M\., Xiao, K\., Hou, L\., Huang, H\., Li, J\.: Eventsum: A large\-scale event\-centric summarization dataset for chinese multi\-news documents\. In: Proceedings of the AAAI Conference on Artificial Intelligence\. vol\. 39, pp\. 26138–26147 \(2025\)

Similar Articles

A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs

arXiv cs.CL

Researchers from Jilin University systematically evaluate positional bias in multi-video summarization using MLLMs, constructing a benchmark from ActivityNet and News videos and assessing nine models with metrics including Coverage, Directional Positional Bias, and Middle-Edge Gap. Results show positional effects are domain- and model-dependent, and increasing visual or generation budget does not uniformly resolve the imbalance.

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

arXiv cs.CL

This paper proposes an evidence-based model to automatically generate query keywords from query-free summarization datasets, enabling the creation of query-focused summarization datasets. Experimental results show that summaries generated using evidence-based queries achieve competitive ROUGE scores compared to original queries.