ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication
Summary
A large-scale dataset of 299,329 public Facebook posts about climate change, with metadata and analysis of themes and engagement, aimed at supporting research on climate discourse.
View Cached Full Text
Cached at: 05/25/26, 09:01 AM
# ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication Source: [https://arxiv.org/abs/2605.23326](https://arxiv.org/abs/2605.23326) [View PDF](https://arxiv.org/pdf/2605.23326) > Abstract:We present ClimateChat\-300K, a large\-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform\. The dataset contains 41 metadata features including post content, engagement metrics, and page attributes, covering material from more than 26,000 global pages\. Each post includes rich contextual information such as language, timestamp, page category, and interaction counts, enabling comprehensive analyses of public discourse around climate communication\. Using topic modeling and sentiment analysis, we identify ten main themes grouped into five domains: policy, activism, cooperation, science, and conservation\. The results reveal that emotional tone, post format, and page identity strongly influence audience engagement, with visually rich and emotionally charged content receiving the highest levels of interaction\. The dataset also demonstrates how online discussions evolved in response to major events such as international climate summits and the COVID\-19 pandemic period\. ClimateChat\-300K provides an open resource for reproducible and interdisciplinary research on polarization, misinformation, and the dynamics of digital climate discourse\. By releasing this dataset, we aim to support transparent, data\-driven research and contribute to a deeper un\-derstanding of how public engagement with climate issues develops across time, geography, and institutional contexts\. ## Submission history From: Wajdi Zaghouani \[[view email](https://arxiv.org/show-email/fa9b3349/2605.23326)\] **\[v1\]**Fri, 22 May 2026 07:41:47 UTC \(438 KB\)
Similar Articles
Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse
Introduces Cohesion-6K, a manually and ChatGPT-assisted annotated dataset of 6,000 Arabic Facebook posts about the Israeli Occupation of Palestine, spanning conflict to cohesion categories. Analysis shows conflict-oriented posts receive 2-4x more engagement than resolution-oriented ones.
Assessing socio-economic climate impacts from text data
This paper reviews recent advances in using natural language processing and large language models to extract socio-economic impact data from text sources for climate hazards, identifies key challenges, and provides recommendations for robust dataset construction.
SynopticBench: Evaluating Vision-Language Models on Generating Weather Forecast Discussions of the Future
This paper introduces SynopticBench, a dataset of 1.3M+ weather forecast discussions paired with meteorological images, and SPACE, a novel evaluation framework for assessing VLM-generated weather forecasts.
ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination
ArabDiscrim is a decade-long lexical resource and corpus of 293K Arabic Facebook posts about racism and discrimination, with engagement signals, morphological regex families, and discrimination axes, supporting fairness-oriented Arabic NLP research.
RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting
Introduces RESCAST-100K, a large-scale benchmark dataset for cross-domain residential load and indoor temperature forecasting, featuring simulated and real data to evaluate transfer learning, domain adaptation, and zero-shot generalization.