ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication

arXiv cs.CL Papers

Summary

A large-scale dataset of 299,329 public Facebook posts about climate change, with metadata and analysis of themes and engagement, aimed at supporting research on climate discourse.

arXiv:2605.23326v1 Announce Type: new Abstract: We present ClimateChat-300K, a large-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform. The dataset contains 41 metadata features including post content, engagement metrics, and page attributes, covering material from more than 26,000 global pages. Each post includes rich contextual information such as language, timestamp, page category, and interaction counts, enabling comprehensive analyses of public discourse around climate communication. Using topic modeling and sentiment analysis, we identify ten main themes grouped into five domains: policy, activism, cooperation, science, and conservation. The results reveal that emotional tone, post format, and page identity strongly influence audience engagement, with visually rich and emotionally charged content receiving the highest levels of interaction. The dataset also demonstrates how online discussions evolved in response to major events such as international climate summits and the COVID-19 pandemic period. ClimateChat-300K provides an open resource for reproducible and interdisciplinary research on polarization, misinformation, and the dynamics of digital climate discourse. By releasing this dataset, we aim to support transparent, data-driven research and contribute to a deeper un-derstanding of how public engagement with climate issues develops across time, geography, and institutional contexts.
Original Article
View Cached Full Text

Cached at: 05/25/26, 09:01 AM

# ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication
Source: [https://arxiv.org/abs/2605.23326](https://arxiv.org/abs/2605.23326)
[View PDF](https://arxiv.org/pdf/2605.23326)

> Abstract:We present ClimateChat\-300K, a large\-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform\. The dataset contains 41 metadata features including post content, engagement metrics, and page attributes, covering material from more than 26,000 global pages\. Each post includes rich contextual information such as language, timestamp, page category, and interaction counts, enabling comprehensive analyses of public discourse around climate communication\. Using topic modeling and sentiment analysis, we identify ten main themes grouped into five domains: policy, activism, cooperation, science, and conservation\. The results reveal that emotional tone, post format, and page identity strongly influence audience engagement, with visually rich and emotionally charged content receiving the highest levels of interaction\. The dataset also demonstrates how online discussions evolved in response to major events such as international climate summits and the COVID\-19 pandemic period\. ClimateChat\-300K provides an open resource for reproducible and interdisciplinary research on polarization, misinformation, and the dynamics of digital climate discourse\. By releasing this dataset, we aim to support transparent, data\-driven research and contribute to a deeper un\-derstanding of how public engagement with climate issues develops across time, geography, and institutional contexts\.

## Submission history

From: Wajdi Zaghouani \[[view email](https://arxiv.org/show-email/fa9b3349/2605.23326)\] **\[v1\]**Fri, 22 May 2026 07:41:47 UTC \(438 KB\)

Similar Articles

Assessing socio-economic climate impacts from text data

arXiv cs.CL

This paper reviews recent advances in using natural language processing and large language models to extract socio-economic impact data from text sources for climate hazards, identifies key challenges, and provides recommendations for robust dataset construction.