ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication

arXiv cs.CL 05/25/26, 04:00 AM Papers

climate-change facebook dataset social-media natural-language-processing topic-modeling sentiment-analysis

Summary

A large-scale dataset of 299,329 public Facebook posts about climate change, with metadata and analysis of themes and engagement, aimed at supporting research on climate discourse.

arXiv:2605.23326v1 Announce Type: new Abstract: We present ClimateChat-300K, a large-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform. The dataset contains 41 metadata features including post content, engagement metrics, and page attributes, covering material from more than 26,000 global pages. Each post includes rich contextual information such as language, timestamp, page category, and interaction counts, enabling comprehensive analyses of public discourse around climate communication. Using topic modeling and sentiment analysis, we identify ten main themes grouped into five domains: policy, activism, cooperation, science, and conservation. The results reveal that emotional tone, post format, and page identity strongly influence audience engagement, with visually rich and emotionally charged content receiving the highest levels of interaction. The dataset also demonstrates how online discussions evolved in response to major events such as international climate summits and the COVID-19 pandemic period. ClimateChat-300K provides an open resource for reproducible and interdisciplinary research on polarization, misinformation, and the dynamics of digital climate discourse. By releasing this dataset, we aim to support transparent, data-driven research and contribute to a deeper un-derstanding of how public engagement with climate issues develops across time, geography, and institutional contexts.

Original Article

View Cached Full Text

Cached at: 05/25/26, 09:01 AM

# ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication
Source: [https://arxiv.org/abs/2605.23326](https://arxiv.org/abs/2605.23326)
[View PDF](https://arxiv.org/pdf/2605.23326)

> Abstract:We present ClimateChat\-300K, a large\-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform\. The dataset contains 41 metadata features including post content, engagement metrics, and page attributes, covering material from more than 26,000 global pages\. Each post includes rich contextual information such as language, timestamp, page category, and interaction counts, enabling comprehensive analyses of public discourse around climate communication\. Using topic modeling and sentiment analysis, we identify ten main themes grouped into five domains: policy, activism, cooperation, science, and conservation\. The results reveal that emotional tone, post format, and page identity strongly influence audience engagement, with visually rich and emotionally charged content receiving the highest levels of interaction\. The dataset also demonstrates how online discussions evolved in response to major events such as international climate summits and the COVID\-19 pandemic period\. ClimateChat\-300K provides an open resource for reproducible and interdisciplinary research on polarization, misinformation, and the dynamics of digital climate discourse\. By releasing this dataset, we aim to support transparent, data\-driven research and contribute to a deeper un\-derstanding of how public engagement with climate issues develops across time, geography, and institutional contexts\.

## Submission history

From: Wajdi Zaghouani \[[view email](https://arxiv.org/show-email/fa9b3349/2605.23326)\] **\[v1\]**Fri, 22 May 2026 07:41:47 UTC \(438 KB\)

ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication

Similar Articles

Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

Team MKC at CLPsych 2026: Capturing and Characterizing Mental Health Changes through Social Media Timeline Dynamics

SocialPersona: Benchmarking Personalized Profiling and Response with Multimodal Social-Media Context

Assessing socio-economic climate impacts from text data

A Context-Aware Dataset for Stance Detection in Bioethical Controversies on Reddit

Submit Feedback

Similar Articles

Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

Team MKC at CLPsych 2026: Capturing and Characterizing Mental Health Changes through Social Media Timeline Dynamics

SocialPersona: Benchmarking Personalized Profiling and Response with Multimodal Social-Media Context

Assessing socio-economic climate impacts from text data

A Context-Aware Dataset for Stance Detection in Bioethical Controversies on Reddit