Tag
This paper introduces a linked multimodal dataset of official speeches from the Russian government, including text, images, metadata, and topic annotations, designed to support social science research and LLM applications in political domains.
The paper introduces an LLM-based topic modeling method and evaluation framework that simultaneously achieves interpretability, topic specificity, and polarity stance consistency, demonstrating superior explanatory power for external outcomes like employee morale using large-scale Japanese corporate review data.
Researchers from EPFL and Idiap apply NLP methods (topic modeling, sentiment analysis, readability scoring) to over 2000 hyper-local news articles to assess how well local French-language media serves migrant communities. The study combines focus groups with computational text analysis to identify gaps between local news content and migrant readers' needs.
CobwebTM is a low-parameter lifelong hierarchical topic modeling approach that adapts the Cobweb algorithm to continuous document embeddings, enabling unsupervised topic discovery and dynamic hierarchical organization without predefining topic counts. The method combines incremental symbolic concept formation with pretrained representations to achieve strong topic coherence while avoiding catastrophic forgetting.