Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

arXiv cs.CL 05/22/26, 04:00 AM Papers

corpus arabic-nlp social-media women-empowerment audience-engagement facebook computational-social-science

Summary

This paper presents the Arabic Women and Society Corpus, a ten-year collection of over 250,000 Arabic Facebook posts related to women's empowerment and social wellbeing, with engagement metrics for analyzing gender discourse and sentiment.

arXiv:2605.22204v1 Announce Type: new Abstract: This paper presents the Arabic Women and Society Corpus, a ten year collection of 252,487 public Arabic Facebook posts related to women's empowerment and social wellbeing. The corpus was collected from 51,660 pages across 77 countries between 2013 and 2024, resulting in more than 267 million user interactions. Each post includes engagement metrics such as shares, comments, and emotional reactions, providing a unique view of audience sentiment and social attention. The data were processed using an automated pipeline with language identification, normalization, and metadata cleaning to ensure reliability and reproducibility. The corpus enables large scale analysis of gender discourse, social reform, and emotional engagement across Arabic dialects. It supports research in Arabic natural language processing, computational social science, and digital communication studies. The dataset and accompanying documentation will be released under request for research use.

Original Article

View Cached Full Text

Cached at: 05/22/26, 08:45 AM

# Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus
Source: [https://arxiv.org/abs/2605.22204](https://arxiv.org/abs/2605.22204)
[View PDF](https://arxiv.org/pdf/2605.22204)

> Abstract:This paper presents the Arabic Women and Society Corpus, a ten year collection of 252,487 public Arabic Facebook posts related to women's empowerment and social wellbeing\. The corpus was collected from 51,660 pages across 77 countries between 2013 and 2024, resulting in more than 267 million user interactions\. Each post includes engagement metrics such as shares, comments, and emotional reactions, providing a unique view of audience sentiment and social attention\. The data were processed using an automated pipeline with language identification, normalization, and metadata cleaning to ensure reliability and reproducibility\. The corpus enables large scale analysis of gender discourse, social reform, and emotional engagement across Arabic dialects\. It supports research in Arabic natural language processing, computational social science, and digital communication studies\. The dataset and accompanying documentation will be released under request for research use\.

## Submission history

From: Wajdi Zaghouani \[[view email](https://arxiv.org/show-email/4fdede53/2605.22204)\] **\[v1\]**Thu, 21 May 2026 09:10:09 UTC \(427 KB\)

Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

Similar Articles

ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets

Submit Feedback

Similar Articles

ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets