Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus
Summary
This paper presents the Arabic Women and Society Corpus, a ten-year collection of over 250,000 Arabic Facebook posts related to women's empowerment and social wellbeing, with engagement metrics for analyzing gender discourse and sentiment.
View Cached Full Text
Cached at: 05/22/26, 08:45 AM
# Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus Source: [https://arxiv.org/abs/2605.22204](https://arxiv.org/abs/2605.22204) [View PDF](https://arxiv.org/pdf/2605.22204) > Abstract:This paper presents the Arabic Women and Society Corpus, a ten year collection of 252,487 public Arabic Facebook posts related to women's empowerment and social wellbeing\. The corpus was collected from 51,660 pages across 77 countries between 2013 and 2024, resulting in more than 267 million user interactions\. Each post includes engagement metrics such as shares, comments, and emotional reactions, providing a unique view of audience sentiment and social attention\. The data were processed using an automated pipeline with language identification, normalization, and metadata cleaning to ensure reliability and reproducibility\. The corpus enables large scale analysis of gender discourse, social reform, and emotional engagement across Arabic dialects\. It supports research in Arabic natural language processing, computational social science, and digital communication studies\. The dataset and accompanying documentation will be released under request for research use\. ## Submission history From: Wajdi Zaghouani \[[view email](https://arxiv.org/show-email/4fdede53/2605.22204)\] **\[v1\]**Thu, 21 May 2026 09:10:09 UTC \(427 KB\)
Similar Articles
ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination
ArabDiscrim is a decade-long lexical resource and corpus of 293K Arabic Facebook posts about racism and discrimination, with engagement signals, morphological regex families, and discrimination axes, supporting fairness-oriented Arabic NLP research.
Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse
Introduces Cohesion-6K, a manually and ChatGPT-assisted annotated dataset of 6,000 Arabic Facebook posts about the Israeli Occupation of Palestine, spanning conflict to cohesion categories. Analysis shows conflict-oriented posts receive 2-4x more engagement than resolution-oriented ones.
AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse
This paper introduces AraHopeCorpus, the first annotated dataset of hope speech in Arabic social media, collected from YouTube comments about the war on Gaza. It provides a detailed annotation framework and analysis, showing that hopeful language dominates crisis discourse.
BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon
This paper introduces BOUTEF, a large-scale multilingual corpus for studying fake news in Algeria and Tunisia, covering Arabic dialects, Arabizi, French, English, and code-switching. It includes empirical analysis of linguistic strategies and engagement dynamics.
LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets
This paper presents a framework for Arabic financial sentiment analysis using LLMs, tailored for the Saudi market, integrating news and social media data to capture investor sentiment.