The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters
Summary
This paper compares top-down and bottom-up approaches for collecting text-based data about disasters from news articles, using German news about landslides as a case study.
View Cached Full Text
Cached at: 07/02/26, 05:38 AM
# The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters Source: [https://arxiv.org/html/2607.00849](https://arxiv.org/html/2607.00849) Brielen Madureira1,2,Andreas Niekler1,3,Mariana Madruga de Brito2,1 1LeipzigLab – Climate Discourse, Leipzig University, Germany2Helmholtz Centre for Environmental Research, Germany 3Computational Humanities, Leipzig University, Germany Correspondence:[brielen\.madureira@uni\-leizig\.de](https://arxiv.org/html/2607.00849v1/mailto:[email protected]) ###### Abstract News articles are an important source of information on disaster impacts and adaptation\. A key methodological challenge in socio\-environmental studies is how to select a representative data sample\. Two approaches are common: querying news databases top\-down with the aid of an existing disaster inventory or using NLP methods to cluster news texts bottom\-up based on temporal and spatial features\. Using a dataset of German news about landslides worldwide, we compare these approaches and discuss variations in event coverage\. Such research design decision can influence the resulting news sample, affecting its use in studies of inequality in media coverage, disaster monitoring and inventory enrichment\. The Course of News Events: A Comparison of Bottom\-Up and Top\-Down Approaches for Collecting Text\-Based Data about Disasters Brielen Madureira1,2,Andreas Niekler1,3,Mariana Madruga de Brito2,11LeipzigLab – Climate Discourse, Leipzig University, Germany2Helmholtz Centre for Environmental Research, Germany3Computational Humanities, Leipzig University, GermanyCorrespondence:[brielen\.madureira@uni\-leizig\.de](https://arxiv.org/html/2607.00849v1/mailto:[email protected]) ## 1Introduction Understanding how humans experience and respond to disasters calls for interdisciplinary collaboration between environmental and \(computational\) social science\(Meehlet al\.,[2000](https://arxiv.org/html/2607.00849#bib.bib18); Albeverioet al\.,[2006](https://arxiv.org/html/2607.00849#bib.bib20); McPhillipset al\.,[2018](https://arxiv.org/html/2607.00849#bib.bib19); de Brito and Sodoge,[2023](https://arxiv.org/html/2607.00849#bib.bib23)\)\. At this intersection blossom fundamental questions about the consequences of climate hazards to society, the relation between adaptation measures and risk reduction, and inequalities in exposure and impact across societal groups\. Ensuing findings can ultimately inform allocation of disaster\-relief funds\(Chapmanet al\.,[2022](https://arxiv.org/html/2607.00849#bib.bib21)\)and democratic decision making\(Sorokaet al\.,[2012](https://arxiv.org/html/2607.00849#bib.bib22)\)\. Synthesizing such knowledge requires gathering information that sprouts on various sources, from sensor\-based measurements to digital documents\. When text data is in view, the Natural Language Processing field joins that interdisciplinary table with methods for e\.g\. classification, information extraction, geoparsing and modelling complex social constructs in language use\(de Britoet al\.,[2026](https://arxiv.org/html/2607.00849#bib.bib15)\)\. Figure 1:Illustrative comparison of two approaches to detect references to disasters in the news\.A fundamental challenge comprises identifying the time, location and impact of disasters \(e\.g\. landslides, wildfires and floods\)\. Numerous studies \(see Section[2](https://arxiv.org/html/2607.00849#S2)\) use global disaster inventories like EM\-DAT\(Delforgeet al\.,[2025](https://arxiv.org/html/2607.00849#bib.bib1)\), which is based on high\-quality, manually curated data but is also incomplete and unavoidably biased\. As a consequence, when the scientific community overly relies on a single source serving as ground truth, the derived collective knowledge may overfit traits of the database rather than the actual phenomena\. To counteract this problem, news databases can provide additional event information via two approaches \(Figure[1](https://arxiv.org/html/2607.00849#S1.F1)\):top\-downprocedures use known events in external disaster inventories toquerynews databases for targeted content \(e\.g\.Caiet al\.,[2025](https://arxiv.org/html/2607.00849#bib.bib6)\), whereasbottom\-upprocedures identify, geolocate and cluster news into segmented events, which can then bealignedto or validated against external inventories \(e\.g\.Valkenborget al\.,[2026](https://arxiv.org/html/2607.00849#bib.bib2)\)\. But neither is infallible: while the first overlooks events not recorded in inventories, the latter ignores events that were not deemed newsworthy by the media represented in the news database\. This paper looks more closely into this matter\. We compare news events identified via top\-down and bottom\-up approaches in dataset of German news about landslides and discuss their advantages and shortcomings\. Such methodological insights, grounded in empirical observations, can strengthen NLP\-supported socio\-environmental studies\. ## 2Related Work The International Disaster Database \(EM\-DAT,Delforgeet al\.,[2025](https://arxiv.org/html/2607.00849#bib.bib1)\) is a widely used global inventory of 27k\+ events, often regarded as the ground truth\. However, its coverage is constrained by its inclusion criteria and the difficulty of gathering detailed information forallcountries\. Even the recorded events have missing data\(Joneset al\.,[2022](https://arxiv.org/html/2607.00849#bib.bib8)\)\. Despite such limitations, hundreds of empirical studies rely on it\(Joneset al\.,[2023](https://arxiv.org/html/2607.00849#bib.bib7)\)\. Concomitantly, news articles have long been seen as a source of information about disasters that serves to create or enrich inventories\(Guzzettiet al\.,[1994](https://arxiv.org/html/2607.00849#bib.bib9); Llasatet al\.,[2009](https://arxiv.org/html/2607.00849#bib.bib10); Tayloret al\.,[2015](https://arxiv.org/html/2607.00849#bib.bib3); Alencaret al\.,[2024](https://arxiv.org/html/2607.00849#bib.bib5); Sodogeet al\.,[2024](https://arxiv.org/html/2607.00849#bib.bib11); Avcıoğluet al\.,[2025](https://arxiv.org/html/2607.00849#bib.bib4),inter alia\), monitor hazard events in near real\-time\(Tanevet al\.,[2008](https://arxiv.org/html/2607.00849#bib.bib24)\)and study media attention to them\(Yan and Bissell,[2015](https://arxiv.org/html/2607.00849#bib.bib17); Kong and Purves,[2026](https://arxiv.org/html/2607.00849#bib.bib16)\)\. Disaster inventories and impact information derived from large\-scale textual data require careful validation\(de Britoet al\.,[2026](https://arxiv.org/html/2607.00849#bib.bib15)\)\. For example, observations can be aligned with EM\-DAT entries for calibration\(Liet al\.,[2025](https://arxiv.org/html/2607.00849#bib.bib13); Dahret al\.,[2026](https://arxiv.org/html/2607.00849#bib.bib12); Valkenborget al\.,[2026](https://arxiv.org/html/2607.00849#bib.bib2)\)\. ## 3Data and Event Matching We used a dataset comprised of almost 55k news articles in German about landslides worldwide, constructed byMadureiraet al\.\([2026](https://arxiv.org/html/2607.00849#bib.bib14)\)\. That study queried thewiso\-netnews database \(in the period from 2000 to 2024\) using landslide\-related keywords, identified relevant articles and geoparsed them at the country level with the aid of Large Language Models \(LLMs\)\. Then,news eventswere identified in a bottom\-up approach\. A news event was defined as a sequence of news articles referring to landslides in the same country, starting on the first day with at least one observation and ending right before at leastθ=5\\theta=5consecutive days111This parameter can vary, but we kept it for consistency\.without any coverage occurred\. Our analysis was based on matching events between the bottom\-up news events from that study and the top\-down events in EM\-DAT\. For that, we extracted a list of 2,014 landslide events \(of type main or associated\) for 138 countries from EM\-DAT, whose entries contain the event’s onset date, country and location, among other information\. A visualisation of how bottom\-up events and top\-down EM\-DAT entries are temporally dispersed is presented in Figures[5](https://arxiv.org/html/2607.00849#A1.F5)and[6](https://arxiv.org/html/2607.00849#A1.F6)\(Appendix\)\. #### Bottom\-up approach Using time series segmentation upon the geolocated news articles,Madureiraet al\.\([2026](https://arxiv.org/html/2607.00849#bib.bib14)\)identified 4,567 news events for 152 countries \(see that publication for methodological details\)\. For each news event, the provided data file contained metadata such as its initial date, duration in days and the associated news articles\. To perform event matching, we considered a news event to betemporally alignedwith an EM\-DAT entry if the entry’s onset date is near the beginning of the news event \(fromΔb\\Delta\_\{b\}days before toΔa\\Delta\_\{a\}days after its initial date\), as in Figure[2](https://arxiv.org/html/2607.00849#S3.F2)\(top\)\. Figure 2:Illustration of the event matching procedures\. #### Top\-down approach For each EM\-DAT entry, event matching was performed byqueryingthe news database around the entry’s onset date \(i\.e\. fromΔb\\Delta\_\{b\}days before toΔa\\Delta\_\{a\}days after it\) for news about the country\. If at least one news article was retrieved, we considered that the EM\-DAT eventtemporally coincidedwith a news event, as in Figure[2](https://arxiv.org/html/2607.00849#S3.F2)\. Instead of setting a fixed retrieval period, we allowed for news events with varying durations following the same rationale used in the bottom\-up approach: the news event started on the first day with a news article in\[Δb,Δa\]\[\\Delta\_\{b\},\\Delta\_\{a\}\]and ended beforeθ\\thetadays without any observed news for the country\. To select theΔ\\Deltavalues for this study, we regarded the onset date in EM\-DAT as correct and acknowledged that some events may take a few days to appear in international news\. Therefore, for the top\-down approach, we setΔb=1\\Delta\_\{b\}=1andΔa=5\\Delta\_\{a\}=5, meaning that the queried news must start within 5 days after the known onset date\. The 1 day before should account for potential errors and early warnings\. Conversely, for the bottom\-up approach, we setΔb=5\\Delta\_\{b\}=5andΔa=1\\Delta\_\{a\}=1, meaning that, if an event is already in the news, its onset date cannot be later than the first news \(1\-day tolerance\), but could have been some days earlier\. Future studies can investigate the impact of using other parameters\. ## 4Analysis We first compared the outcomes of the two event matching strategies by quantifying \(i\) how many bottom\-up news events temporallyalignedwith an EM\-DAT entry and \(ii\) how many EM\-DAT entries temporally matched news events throughquerying\. As shown in Table[1](https://arxiv.org/html/2607.00849#S4.T1), the bottom\-up approach identified more than twice as many news events as the number of entries in EM\-DAT\. However, less than 17% of the news events temporally aligned with an EM\-DAT entry\. In contrast, the top\-down queries successfully queried temporally coinciding news events for almost 43% of the EM\-DAT entries\. The alignment and querying procedures are not bijective\. On the one hand, EM\-DAT entries with distinct onset dates can end up querying the same news event at different days throughout it \(especially news events of long duration\)\. On the other hand, bottom\-up news events can be aligned with multiple EM\-DAT entries whose onset dates are close in time\. Such partial or multiple matches between EM\-DAT entries and news events would require further disambiguation steps\. Figure[3](https://arxiv.org/html/2607.00849#S4.F3)depicts the confusion matrices with the overlap between aligned and queried events in each type of event source\. The 851 successful queries covered 779 unique news events\. 89 queries matched news events midway through and 60 news events were queried by more than one EM\-DAT entry\. Such cases require post\-processing decisions on whether two distinct news topics were inappropriately merged in the bottom\-up approach \(e\.g\. because they overlapped in time\), or whether the event matching was simply spurious in terms of content\. Even in the top\-down approach, such reasoning would require bottom\-up information about when underlying news events begin and end\. Figure[3](https://arxiv.org/html/2607.00849#S4.F3)also shows that 737 aligned bottom\-up events covered 762 EM\-DAT entries \(one of them twice\)\. 26 news events had multiple alignments with EM\-DAT entries and only one news event was aligned but not detected via querying \(because the query captured an immediately preceding news event\)\. Again, post\-processing decisions would be needed to determine which EM\-DAT entry properly reflects the content of the news event\. Bottom\-upnews events4,567aligned to EM\-DAT737 \(16\.1%\)Top\-downEM\-DAT events2,014queried in the news851 \(42\.2%\)Table 1:Number of bottom\-up news events and top\-down EM\-DAT entries with the portion of temporally matched events\.  Figure 3:Confusion matrices with the number of aligned and/or queried events in both sources\.The core finding is that the two approaches coincided in temporally matching 762 EM\-DAT entries to 736 news events\. 57\.7% of EM\-DAT events had empty queries, meaning that there was no relevant news near their onset dates\. Similarly, 82\.9% of the bottom\-up news events did not align with any EM\-DAT entry\. These results indicate that, on the one hand, EM\-DAT contains entries that could not be detected in the German news, and, on the other hand, that the media may form news events that are not recorded in EM\-DAT\. Had only top\-down queries been used, many news events would have been missed, while a bottom\-up approach alone would have ignored more than half of the known landslides worldwide that did not appear in the German media \(as represented in this data sample\)\. Although alignment can be used to calibrate bottom\-up news events, in practice the decision about which news sample to use lies between thequeriedEM\-DAT entries orallbottom\-up news events\. Country\-level analyses would be impacted by this research design choice\. Figure[4](https://arxiv.org/html/2607.00849#S4.F4)shows how many events were detected for observed countries in each approach\. The maps in Figures[7](https://arxiv.org/html/2607.00849#A1.F7),[8](https://arxiv.org/html/2607.00849#A1.F8)and[9](https://arxiv.org/html/2607.00849#A1.F9)\(Appendix\) illustrate the spatial coverage and gaps of each approach\. The bottom\-up approach generally detected more news events per country than the successful queries, with noticeable variations in the resulting sample distribution\. For instance, 54\.3% of the bottom\-up news events are assigned to the Global South, compared to 81\.4% of the queried news events; 47\.7% of the bottom\-up news events refer to high income countriesversus20\.6% of the queried news events; and 35\.8% of the former are in Europe in contrast to 9\.7% of the latter \(see detailed Tables in the Appendix\)\. Figure 4:Number of detected news events by country\. ## 5Discussion Numerical results alone may give the impression that bottom\-up news events are more advantageous, as they exceed the number of queried news events and can also aid near real\-time monitoring\. But caution is warranted: news events do not always reflect new real\-world events relevant to disaster inventories\. False positives stemming from errors in the NLP pipeline add noise to the sample and can form spurious news events\. Moreover, many news events bring thetopicof landslides in a country to the public’s attention without reporting about recent or ongoing events\. Based on an initial manual verification of 150\+ news events, we identified a few common types of bottom\-up news events: - •concrete in progress: texts that refer to ongoing or recent specific landslides in a country; - •concrete but past: texts that refer to landslides that happened in the near or distant past in a country \(e\.g\. aftermath of an event after some time, memories of previous disasters, judicial decisions about past events, pre\-historic events that shaped landscapes etc\.\); - •indefinite: texts about landslides in a country with no reference to specific events having taken place \(e\.g\. studies, aggregated outcome during a period, warnings, references to risk, preventive measures, hypothetical events, the phenomenon as a whole, underspecified or implicit sentences etc\.\); - •vague: texts that refer to landslides in fiction works or just as rhetorical examples; - •fully false positives: misclassified texts \(e\.g\. figurative uses of landslide terms, wrong or unclear geolocation\)\. News events detected bottom\-up can also mix texts of different types that coincide in time or repeat the same text in distinct events when the news article is republished after many days\. Both approaches have advantages and disadvantages and should be regarded as complementary\. The bottom\-up approach may detect events not present in EM\-DAT, but it also captures news on the generaltopicof landslides in a country and is prone to noise due to the impossibility of perfect classification and geolocation\. It is suitable for studies of media attention but requires refinement to serve as a source of information to create inventories\. The top\-down approach is more controllable but requires disambiguation of news events captured midway through and remains constrained to a subsample of known events\. It may be more appropriate for enriching existing disaster databases with impact data, but it introduces the inventory’s biases into subsequent conclusions about media coverage\. Both approaches share the limitation that events not covered by the media represented in the sample are left out, making a top\-down reference important for quantifying the lack of media coverage\. The biases in media coverage and EM\-DAT inclusion are arguably not random\. EM\-DAT focuses on severe events, whereas the media may devote more attention to events in more populous or economically relevant regions, or to events that attract greater public interest\. Further studies can unravel the differences in which happenings get to be considered “events” in each source\. ## 6Conclusion Despite their potential for broad temporal and geographic coverage of disaster events, news\-based datasets are shaped by how the media selects, frames and reports disasters\. Besides, articles often refer to past or hypothetical events, and unstructured information is sometimes left underspecified\. Large\-scale datasets can conceal shortcomings that affect derived conclusions\. Automated information extraction from texts does not eliminate all false positives, thereby hindering temporal and spatial precision\. Regardless of whether a sample of disaster news is constructed via a bottom\-up or top\-down approach, measures to ensure data quality and to understand which events were \(or were not\) captured are imperative\. Diagnostic analyses can expose problems that otherwise go unnoticed in the big data paradigm\. By discussing observed differences between the two approaches, this work has contributed to best practices in text\-based climate impact research, supporting authors in making informed methodological choices\. ## References - Extreme events in nature and society\.Springer Science & Business Media\.External Links:[Link](https://link.springer.com/book/10.1007/3-540-28611-X)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p1.1)\. - P\. H\. L\. Alencar, J\. Sodoge, E\. Nora Paton, and M\. Madruga De Brito \(2024\)Flash droughts and their impacts—using newspaper articles to assess the perceived consequences of rapidly emerging droughts\.Environmental Research Letters19\(7\),pp\. 074048\.External Links:ISSN 1748\-9326,[Link](https://iopscience.iop.org/article/10.1088/1748-9326/ad58fa),[Document](https://dx.doi.org/10.1088/1748-9326/ad58fa)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - A\. Avcıoğlu, O\. Demir, and T\. Görüm \(2025\)An automated approach for developing geohazard inventories using news: integrating natural language processing \(nlp\), machine learning, and mapping\.Natural Hazards and Earth System Sciences25\(7\),pp\. 2421–2435\.External Links:ISSN 1684\-9981,[Link](http://dx.doi.org/10.5194/nhess-25-2421-2025),[Document](https://dx.doi.org/10.5194/nhess-25-2421-2025)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - E\. Cai, X\. Chen, R\. G\. Keeney, E\. Zuckerman, B\. O’Connor, and P\. A\. Grabowicz \(2025\)Identifying and investigating global news coverage of critical events such as disasters and terrorist attacks\.Proceedings of the International AAAI Conference on Web and Social Media19\(1\),pp\. 307–323\.External Links:[Link](https://ojs.aaai.org/index.php/ICWSM/article/view/35818),[Document](https://dx.doi.org/10.1609/icwsm.v19i1.35818)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p4.1)\. - C\. M\. Chapman, M\. J\. Hornsey, K\. S\. Fielding, and R\. Gulliver \(2022\)International media coverage promotes donations to a climate disaster\.Disasters47\(3\),pp\. 725–744\.External Links:ISSN 1467\-7717,[Link](http://dx.doi.org/10.1111/disa.12557),[Document](https://dx.doi.org/10.1111/disa.12557)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p1.1)\. - I\. Dahr, A\. Cabre, I\. Marinov, and E\. Wibbels \(2026\)Climate change and migration in central america: evidence from new environmental event data\.Note:Kleinman Center for Energy Policy at the University of PennsylvaniaExternal Links:[Link](https://kleinmanenergy.upenn.edu/wp-content/uploads/2026/04/KCEP-Digest-92-Climate-Change-and-Migration.pdf)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - M\. M\. de Brito, B\. Madureira, T\. M\. N\. Carvalho, D\. Delforge, A\. Jézéquel, M\. Kurfalı, N\. Li, G\. Messori, J\. Nivre, B\. Pernici, N\. Speybroeck, S\. Terzi, W\. Thiery, B\. Valkenborg, J\. Wang, S\. Zahra, J\. Zscheischler, and J\. Sodoge \(2026\)Assessing socio\-economic climate impacts from text data\.External Links:2605\.20793,[Link](https://arxiv.org/abs/2605.20793)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p2.1),[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - M\. M\. de Brito and J\. Sodoge \(2023\)Computational social sciences in der umweltsoziologie\.InHandbuch Umweltsoziologie,pp\. 1–15\.Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p1.1)\. - D\. Delforge, V\. Wathelet, R\. Below, C\. L\. Sofia, M\. Tonnelier, J\. A\.F\. van Loenhout, and N\. Speybroeck \(2025\)EM\-dat: the emergency events database\.International Journal of Disaster Risk Reduction124,pp\. 105509\.External Links:ISSN 2212\-4209,[Link](http://dx.doi.org/10.1016/j.ijdrr.2025.105509),[Document](https://dx.doi.org/10.1016/j.ijdrr.2025.105509)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p3.1),[§2](https://arxiv.org/html/2607.00849#S2.p1.1)\. - F\. Guzzetti, M\. Cardinali, and P\. Reichenbach \(1994\)The avi project: a bibliographical and archive inventory of landslides and floods in italy\.Environmental Management18\(4\),pp\. 623–633\.External Links:ISSN 1432\-1009,[Link](http://dx.doi.org/10.1007/BF02400865),[Document](https://dx.doi.org/10.1007/bf02400865)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - R\. L\. Jones, D\. Guha\-Sapir, and S\. Tubeuf \(2022\)Human and economic impacts of natural disasters: can we trust the global data?\.Scientific Data9\(1\)\.External Links:ISSN 2052\-4463,[Link](http://dx.doi.org/10.1038/s41597-022-01667-x),[Document](https://dx.doi.org/10.1038/s41597-022-01667-x)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p1.1)\. - R\. L\. Jones, A\. Kharb, and S\. Tubeuf \(2023\)The untold story of missing data in disaster research: a systematic review of the empirical literature utilising the emergency events database \(em\-dat\)\.Environmental Research Letters18\(10\),pp\. 103006\.External Links:ISSN 1748\-9326,[Link](http://dx.doi.org/10.1088/1748-9326/acfd42),[Document](https://dx.doi.org/10.1088/1748-9326/acfd42)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p1.1)\. - I\. Kong and R\. S\. Purves \(2026\)Analyzing geographic bias of newspaper articles reporting global climate disasters\.Annals of the American Association of Geographers116\(2\),pp\. 270–288\.External Links:[Document](https://dx.doi.org/10.1080/24694452.2025.2564220),[Link](https://doi.org/10.1080/24694452.2025.2564220),https://doi\.org/10\.1080/24694452\.2025\.2564220Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - N\. Li, W\. Thiery, S\. Zahra, M\. Madruga de Brito, K\. Worou, M\. Kurfalı, S\. Lampe, P\. Muñoz, C\. Flynn, C\. Trigoso, J\. Nivre, J\. Zscheischler, and G\. Messori \(2025\)Wikimpacts 1\.0: a new global climate impact database based on automated information extraction from wikipedia\.External Links:[Link](http://dx.doi.org/10.5194/egusphere-2025-4891),[Document](https://dx.doi.org/10.5194/egusphere-2025-4891)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - M\. C\. Llasat, M\. Llasat\-Botija, and L\. López \(2009\)A press database on natural risks and its application in the study of floods in northeastern spain\.Natural Hazards and Earth System Sciences9\(6\),pp\. 2049–2061\.External Links:ISSN 1684\-9981,[Link](http://dx.doi.org/10.5194/nhess-9-2049-2009),[Document](https://dx.doi.org/10.5194/nhess-9-2049-2009)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - B\. Madureira, A\. Niekler, M\. Keuschnigg, and M\. M\. de Brito \(2026\)How loud rumbles hit newsstands: a data analysis of coverage and spatial bias in german news about landslides around the world\.External Links:2605\.18105,[Link](https://arxiv.org/abs/2605.18105)Cited by:[§3](https://arxiv.org/html/2607.00849#S3.SS0.SSS0.Px1.p1.2),[§3](https://arxiv.org/html/2607.00849#S3.p1.1)\. - L\. E\. McPhillips, H\. Chang, M\. V\. Chester, Y\. Depietri, E\. Friedman, N\. B\. Grimm, J\. S\. Kominoski, T\. McPhearson, P\. Méndez\-Lázaro, E\. J\. Rosi, and J\. Shafiei Shiva \(2018\)Defining extreme events: a cross\-disciplinary review\.Earth’s Future6\(3\),pp\. 441–455\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1002/2017EF000686),[Link](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1002/2017EF000686),https://agupubs\.onlinelibrary\.wiley\.com/doi/pdf/10\.1002/2017EF000686Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p1.1)\. - G\. A\. Meehl, T\. Karl, D\. R\. Easterling, S\. Changnon, R\. Pielke, D\. Changnon, J\. Evans, P\. Ya\. Groisman, T\. R\. Knutson, K\. E\. Kunkel, L\. O\. Mearns, C\. Parmesan, R\. Pulwarty, T\. Root, R\. T\. Sylves, P\. Whetton, and F\. Zwiers \(2000\)An introduction to trends in extreme weather and climate events: observations, socioeconomic impacts, terrestrial ecological impacts, and model projections\.Bulletin of the American Meteorological Society81\(3\),pp\. 413 – 416\.External Links:[Document](https://dx.doi.org/10.1175/1520-0477%282000%29081%3C0413%3AAITTIE%3E2.3.CO%3B2),[Link](https://journals.ametsoc.org/view/journals/bams/81/3/1520-0477_2000_081_0413_aittie_2_3_co_2.xml)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p1.1)\. - J\. Sodoge, C\. Kuhlicke, M\. D\. Mahecha, and M\. M\. de Brito \(2024\)Text mining uncovers the unique dynamics of socio\-economic impacts of the 2018–2022 multi\-year drought in germany\.Natural Hazards and Earth System Sciences24\(5\),pp\. 1757–1777\.External Links:ISSN 1684\-9981,[Link](http://dx.doi.org/10.5194/nhess-24-1757-2024),[Document](https://dx.doi.org/10.5194/nhess-24-1757-2024)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - S\. Soroka, S\. Farnsworth, A\. Lawlor, and L\. Young \(2012\)Mass media and policy\-making\.InRoutledge handbook of public policy,pp\. 204–214\.External Links:[Link](https://www.taylorfrancis.com/chapters/edit/10.4324/9780203097571-20/mass-media-policy-making-stuart-soroka-stephen-farnsworth-andrea-lawlor-lori-young)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p1.1)\. - H\. Tanev, J\. Piskorski, and M\. Atkinson \(2008\)Real\-time news event extraction for global crisis monitoring\.InNatural Language and Information Systems,pp\. 207–218\.External Links:ISBN 9783540698586,ISSN 1611\-3349,[Link](http://dx.doi.org/10.1007/978-3-540-69858-6_21),[Document](https://dx.doi.org/10.1007/978-3-540-69858-6%5F21)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - F\. E\. Taylor, B\. D\. Malamud, K\. Freeborough, and D\. Demeritt \(2015\)Enriching great britain’s national landslide database by searching newspaper archives\.Geomorphology249,pp\. 52–68\.External Links:ISSN 0169\-555X,[Link](http://dx.doi.org/10.1016/j.geomorph.2015.05.019),[Document](https://dx.doi.org/10.1016/j.geomorph.2015.05.019)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - B\. Valkenborg, O\. Dewitte, and B\. Smets \(2026\)Unravelling information on impactful geo\-hydrological hazard events with hazminer, a multilingual text mining method developed through a global scale coverage application\.EGUsphere2026,pp\. 1–45\.External Links:[Link](https://egusphere.copernicus.org/preprints/2026/egusphere-2026-722/),[Document](https://dx.doi.org/10.5194/egusphere-2026-722)Cited by:[§1](https://arxiv.org/html/2607.00849#S1.p4.1),[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. - Y\. Yan and K\. Bissell \(2015\)The sky is falling: predictors of news coverage of natural disasters worldwide\.Communication Research45\(6\),pp\. 862–886\.External Links:ISSN 1552\-3810,[Link](http://dx.doi.org/10.1177/0093650215573861),[Document](https://dx.doi.org/10.1177/0093650215573861)Cited by:[§2](https://arxiv.org/html/2607.00849#S2.p2.1)\. ## Appendix AAppendix ### Limitations This analysis faces some limitations\. First, it relies on country\-level event matching, which may merge distinct landslides occurring close in time within the same country\. Important parameters \(θ\\theta,Δa\\Delta\_\{a\}andΔb\\Delta\_\{b\}\) were set to fixed values\. Further sensitivity analyses should examine how other values would affect the results and whether different countries or types of events require different values \(e\.g\. events in a neighbouring country may possibly hit the headlines faster than events in geographically and culturally distant countries\)\. Depending on the use case, further investigation is also needed to refine the bottom\-up approach so as to capture only concrete and current real\-world events\. Finally, both the alignment and the querying event matching strategies were limited to thetemporalaspect and the country\. Detailed error analyses regardingcontentmatching should still be conducted to ensure that the content of the news event actually refers to the EM\-DAT entry\. While the specific location toponyms in EM\-DAT could help further refine the queries, news articles do not always mention them, typos and name variation are frequent, and specific places may remain unreported when hazards affect a broad region\. ### Details and Additional Results Table[2](https://arxiv.org/html/2607.00849#A1.T2)shows the distribution of identified and queried news events by country category, mentioned in Section[4](https://arxiv.org/html/2607.00849#S4)\. Table 3 lists the number of \(aligned\) news events and \(queried\) EM\-DAT entries by country\. Table 2:Distribution of bottom\-up news events and top\-down queried news events by country category\.Figures[5](https://arxiv.org/html/2607.00849#A1.F5)and[6](https://arxiv.org/html/2607.00849#A1.F6)plot the onset date of EM\-DAT events and the first active day of news events along the time period \(x axis\) for the 20 countries with the most observed news events\. Figure[5](https://arxiv.org/html/2607.00849#A1.F5)is a broad overview of the whole period, whereas Figure[6](https://arxiv.org/html/2607.00849#A1.F6)zooms in on the year 2024 for a more detailed inspection\. These illustrations show the three typical \(mis\)matching cases: isolated EM\-DAT events that \(most likely\) had no media coverage in German newspapers; isolated news events that do not seem to have been recorded in EM\-DAT; and the matches, i\.e\. EM\-DAT and news events that occur in neighbouring days, subject to alignment and querying\. Figures[7](https://arxiv.org/html/2607.00849#A1.F7),[8](https://arxiv.org/html/2607.00849#A1.F8)and[9](https://arxiv.org/html/2607.00849#A1.F9)are meant for comparing the differences in spatial distribution of each source: EM\-DAT events, EM\-DAT events for which news articles could be queried, and news events\. This evidence indicates that the chosen method can yield samples with considerable differences in coverage that would influence subsequent analyses\. Figure 5:Broad overview of the temporal dispersion of the onset days of EM\-DAT entries \(green circles\) and initial days of news events \(purple diamonds\) for the 20 countries with most news events\. The x\-axis lists all days in the period from Jan 1, 2000 to Dec 31, 2024\.Figure 6:Broad overview of the temporal dispersion of the onset days of EM\-DAT entries \(green circles\) and initial days of news events \(purple diamonds\) for the 20 countries with most news events\. The x\-axis lists all days in the period from Jan 1, 2024 to Dec 31, 2024\. Figure 7:Spatial distribution of EM\-DAT entries referring to landslides\. Germany \(in black\) was not analysed\. Figure 8:Spatial distribution of EM\-DAT entries referring to landslides that could be queried \(top\-down\) in the news database\. Germany \(in black\) was not analysed\. Figure 9:Spatial distribution of \(bottom\-up\) news events referring to landslides\. Germany \(in black\) was not analysed\.Bottom UpTop Downnews eventsalignedEM\-DATqueriedABW1000AFG388539AGO1020ALB2040ARG10151ARM0010ASM0010AUS37141AUT237475AZE0020BDI7282BEL6020BFA2000BGD428238BGR5030BHS1000BIH175105BOL223213BRA138407743BRB0010BTN5121CAF1000CAN27111CHE306686CHL456146CHN21780170114CIV2191CMR18484COD368268COG1040COL100236525COM1020CPV0010CRI163183CUB6272CYM2000CZE74030DJI0020DMA7222DNK11000DOM176166DZA4141ECU445215EGY9111ERI1000ESP116343ETH13393FJI4181FRA120111611FSM1000GBR67111GEO102122GHA4010GIN1010GLP0010GMB1000GRC617117GRD0020GRL8000GTM55153815GUF0010GUM0010HKG3000HND266146HRV9010HTI486248HUN9000IDN1365416071IND173398743IRL2000IRN10292IRQ2010ISL4000ISR6000ITA314274030JAM10464JOR1000JPN114356241KAZ3111KEN183173KGZ83144KHM4010KIR1000KOR166167LAO4040LBN1000LBR1000LBY1111LCA0020LIE2000LKA33103311LSO2000LUX15010LVA1000MAC0010MAR9242MDA2000MDG161111MEX61185328MKD1151MLT1000MMR287227MNE4000MNG1020MOZ6121MRT1000MTQ1020MUS1000MWI5131MYS173133NER0010NGA2030NIC38292NLD6000NOR80222NPL142184525NZL468108OMN0020PAK89194621PAN162172PER103134213PHL1536213875PNG205225POL23000PRI2050PRK13070PRT30575PRY3020REU0020ROU20292RUS532112RWA113183SAU2010SCG0010SLB3252SLE18262SLV3281710SOM3010SPI0010SRB11161STP0010SUR1000SVK4010SVN13242SWE8000SYC2010SYR2111TCD0010THA182172TJK62302TLS1121TON1000TTO1141TUN4000TUR436146TWN47121813TZA15373UGA369229UKR4000URY3010USA191163817UZB0020VAT1000VCT2030VEN334105VIR0010VNM54205621VUT5161WSM3000YEM8363ZAF28242ZMB2010ZWE4010Table 3:Number of \(aligned and total\) news events and \(queried and total\) EM\-DAT entries by country\.
Similar Articles
Assessing socio-economic climate impacts from text data
This paper reviews recent advances in using natural language processing and large language models to extract socio-economic impact data from text sources for climate hazards, identifies key challenges, and provides recommendations for robust dataset construction.
DisasterLex: An Expert Concept-to-Schema Knowledge Graph for Geospatial Reasoning in Disaster Analytics
DisasterLex introduces a knowledge-graph-mediated framework that improves text-to-SQL for disaster analytics by using an expert knowledge graph with causal edges to constrain schema and guide query planning, outperforming state-of-the-art baselines.
Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery
This paper applies topological data analysis to flood detection by extracting topological features from satellite imagery and incorporating them into neural networks, demonstrating improved robustness and interpretability over conventional methods.
Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
This paper proposes a validation framework for using Large Language Models to extract causal relations from social media posts during disasters. It evaluates the effectiveness of LLMs in identifying cause-effect relationships and compares them against expert-grounded reference graphs to assess reliability and risks.
Migrant Voices, Local News: Insights on Bridging Community Needs with Media Content
Researchers from EPFL and Idiap apply NLP methods (topic modeling, sentiment analysis, readability scoring) to over 2000 hyper-local news articles to assess how well local French-language media serves migrant communities. The study combines focus groups with computational text analysis to identify gaps between local news content and migrant readers' needs.