Tag
This paper develops a codebook for self-stigma among people who use drugs and analyzes 72,115 Reddit posts to examine prevalence, co-occurrence, and temporal patterns of cognitive, affective, and behavioral stigma indicators, finding that self-stigma is expressed as an integrated phenomenon with behavioral indicators often preceding core indicators.
This paper presents the largest computational analysis of Canadian news coverage of police-involved deaths over 25 years, introducing a novel model (PerspectiveGap) that quantifies the dominance of state bureaucrat perspectives compared to civilian voices in media narratives.
This paper introduces conditional hypothesis generation, a framework that incorporates researcher-specified covariates to steer LLM-based text analysis toward discovering meaningful subgroup differences while addressing confounds like stratum imbalance and sign reversal.
This paper discusses the need for multilingual LLMs that are epistemically grounded and responsible for applications in computational social science and humanities.
This paper proposes a label-light measurement diagnostic to evaluate whether popular text analysis methods (dictionaries, topic models, embeddings, LLMs) capture substantive stance versus symbolic rhetoric in entrepreneurial-discourse measurement, using a corpus of 80 Chinese SOE speeches and a natural experiment with same-company different-speaker pairs. The authors find that zero-shot LLMs show higher sensitivity but a significant portion of the effect may be due to speaker idiolect rather than substantive stance.
This paper presents the Arabic Women and Society Corpus, a ten-year collection of over 250,000 Arabic Facebook posts related to women's empowerment and social wellbeing, with engagement metrics for analyzing gender discourse and sentiment.
This paper critiques the 'Proxy Presumption' in NLP, where geometric embedding properties are incorrectly equated with social constructs. It introduces the Construct Validity Protocol and Counterfactual Neutralization methods to ensure rigorous validation of social measures derived from semantic embeddings.