A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses
Summary
This paper compares Structural Topic Models (STM) and BERTopic for analyzing short, open-ended survey responses, finding that BERTopic with contextual augmentation yields better topic coherence and interpretability, while STM offers stronger support for inferential covariate analysis.
View Cached Full Text
Cached at: 05/25/26, 08:59 AM
# A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses Source: [https://arxiv.org/abs/2605.23093](https://arxiv.org/abs/2605.23093) [View PDF](https://arxiv.org/pdf/2605.23093) > Abstract:Topic modeling in applied psychology increasingly spans two methodological traditions: probabilistic bag\-of\-words models and newer embedding\-based approaches\. Yet many evaluations of these methods rely on longer and cleaner benchmark corpora, leaving less guidance for short, open\-ended survey responses\. This paper compares Structural Topic Models \(STM\), a probabilistic topic model, and BERTopic, an embedding\-based model, for analyzing open\-ended survey responses\. We evaluated three STM conditions and five BERTopic conditions, varying typographical correction, stemming, embedding choice, and contextual augmentation, a strategy we introduced to provide additional semantic context for very short responses\. Results indicate that BERTopic consistently produced higher topic coherence than STM, with contextual augmentation yielding the strongest performance gains\. In contrast, higher\-dimensional embeddings alone did not improve coherence and were associated with greater data loss\. Qualitative evaluation showed that BERTopic generated more interpretable and stable topics, while STM topics were often broader and more mixed\. However, STM provides stronger support for inferential covariate analysis, whereas BERTopic covariate comparisons are primarily descriptive\. These findings suggest that STM and BERTopic offer complementary strengths\. We conclude with practical guidance for selecting and combining topic modeling approaches in applied social science research\. ## Submission history From: Yan Jiang \[[view email](https://arxiv.org/show-email/eede4031/2605.23093)\] **\[v1\]**Thu, 21 May 2026 23:00:40 UTC \(1,332 KB\)
Similar Articles
A comparative study of transformer-based embeddings for topic coherence
This paper systematically compares the impact of model size on topic quality using seven transformer-based language models in a BERTopic pipeline, finding that model size has negligible effect on topic coherence, suggesting smaller models can perform comparably to larger ones.
Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review Data
The paper introduces an LLM-based topic modeling method and evaluation framework that simultaneously achieves interpretability, topic specificity, and polarity stance consistency, demonstrating superior explanatory power for external outcomes like employee morale using large-scale Japanese corporate review data.
Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models
This paper compares the geometric structures induced by deep learning vector embeddings (CamemBERT) and lexical co-occurrence graph models on the French 'Great National Debate' corpus, finding similar local topology but distinct global organization, highlighting complementarity between the two approaches.
Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles
This paper investigates whether topic sentiment causally affects perceived political ideology in news articles, comparing human annotations from AllSides with those from LLMs including GPT-4o-mini and Llama-3.3-70B. It finds that fine-tuned GPT-4o-mini exhibits a spurious sentiment-ideology coupling not present in human judgments, highlighting risks of using LLM annotations as proxies in causal analyses.
A Comparative Study of Language Models for Khmer Retrieval-Augmented Question Answering
This paper presents a comparative evaluation of embedding models and generator backends for Khmer-language retrieval-augmented question answering in the telecom domain, finding that BGE-M3 performs best for retrieval while generator strengths vary across metrics.