AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

arXiv cs.CL 05/25/26, 04:00 AM Papers

arabic hope-speech social-media crisis-discourse dataset annotation nlp

Summary

This paper introduces AraHopeCorpus, the first annotated dataset of hope speech in Arabic social media, collected from YouTube comments about the war on Gaza. It provides a detailed annotation framework and analysis, showing that hopeful language dominates crisis discourse.

arXiv:2605.23325v1 Announce Type: new Abstract: Social media has become a crucial arena for shaping public narratives during armed conflicts, providing space for both harmful and constructive communication. While hate speech and misinformation have been widely studied, expressions that promote resilience, solidarity, and optimism remain underexplored, particularly in Arabic contexts. This paper introduces AraHopeCorpus, the first annotated dataset of Arabic hope speech collected from ten thousand YouTube comments related to the war on Gaza between 2023 and 2024. Using a detailed annotation framework, comments were classified into three categories: hope speech, no hope speech, and neutral or unclear discourse. The dataset shows that hopeful language dominates, accounting for more than sixty four percent of all comments. These expressions of hope appear mainly as religious encouragement, collective solidarity, and optimism for endurance and justice. No hope speech, representing about thirteen percent, reflects despair and disillusionment, while the rest of the comments contain neutral or mixed content. Inter-Annotator Agreement reached substantial levels (Cohen's Kappa equals 0.71), though dialectal variation, sarcasm, and implicit meaning posed annotation challenges. A comparative analysis between human annotators and ChatGPT revealed that large language models can support annotation but remain limited in handling dialectal and culturally embedded expressions. AraHopeCorpus will be released for research purposes under an open and non commercial license. It provides a valuable resource for studying constructive digital discourse, enabling further research on hope speech detection, crisis communication, and resilience in Arabic social media.

Original Article

View Cached Full Text

Cached at: 05/25/26, 09:00 AM

# AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse
Source: [https://arxiv.org/abs/2605.23325](https://arxiv.org/abs/2605.23325)
[View PDF](https://arxiv.org/pdf/2605.23325)

> Abstract:Social media has become a crucial arena for shaping public narratives during armed conflicts, providing space for both harmful and constructive communication\. While hate speech and misinformation have been widely studied, expressions that promote resilience, solidarity, and optimism remain underexplored, particularly in Arabic contexts\. This paper introduces AraHopeCorpus, the first annotated dataset of Arabic hope speech collected from ten thousand YouTube comments related to the war on Gaza between 2023 and 2024\. Using a detailed annotation framework, comments were classified into three categories: hope speech, no hope speech, and neutral or unclear discourse\. The dataset shows that hopeful language dominates, accounting for more than sixty four percent of all comments\. These expressions of hope appear mainly as religious encouragement, collective solidarity, and optimism for endurance and justice\. No hope speech, representing about thirteen percent, reflects despair and disillusionment, while the rest of the comments contain neutral or mixed content\. Inter\-Annotator Agreement reached substantial levels \(Cohen's Kappa equals 0\.71\), though dialectal variation, sarcasm, and implicit meaning posed annotation challenges\. A comparative analysis between human annotators and ChatGPT revealed that large language models can support annotation but remain limited in handling dialectal and culturally embedded expressions\. AraHopeCorpus will be released for research purposes under an open and non commercial license\. It provides a valuable resource for studying constructive digital discourse, enabling further research on hope speech detection, crisis communication, and resilience in Arabic social media\.

## Submission history

From: Wajdi Zaghouani \[[view email](https://arxiv.org/show-email/05a31949/2605.23325)\] **\[v1\]**Fri, 22 May 2026 07:39:21 UTC \(426 KB\)

AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

Similar Articles

Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

Linear Semantic Segmentation for Low-Resource Spoken Dialects

Submit Feedback

Similar Articles

Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

Linear Semantic Segmentation for Low-Resource Spoken Dialects