Tag
This paper proposes a multidimensional text analysis approach combining Japanese NLP metrics and statistical methods to evaluate changes in risk disclosure quality, applied to Japan's 2019 corporate disclosure reforms. The analysis of 19,770 firm-year observations reveals complex shifts such as increased volume accompanied by decreased readability.
This paper applies computational stylometry to English translations of the Pali Canon, examining vocabulary differences across the Sutta, Vinaya, and Abhidhamma divisions.
Introduces Persuasion Index (PI), a theory-guided framework with 15 dimensions and 55 sub-features for analyzing persuasive rhetorical cues in text. The framework is modular, open-source, and evaluated on four datasets, providing interpretable feature spaces for persuasion-related outcomes.
This paper develops a geometric framework to measure semantic content of texts using sentence embeddings, proposing a three-coordinate semantic profile (novelty, breadth, integration) and a scalar trade-off triangle, validated across synthetic categories and novels.
This paper introduces PEEL (Protocols for Epistemically Engaged Literacy in AI), a framework combining deterministic text analysis via Voyant Tools with LLM interpretation via Claude, grounded in Peircean semiotics, to expose systematic distortions in AI-generated research summaries and promote epistemic accountability.
This paper introduces conditional hypothesis generation, a framework that incorporates researcher-specified covariates to steer LLM-based text analysis toward discovering meaningful subgroup differences while addressing confounds like stratum imbalance and sign reversal.
This paper proposes a label-light measurement diagnostic to evaluate whether popular text analysis methods (dictionaries, topic models, embeddings, LLMs) capture substantive stance versus symbolic rhetoric in entrepreneurial-discourse measurement, using a corpus of 80 Chinese SOE speeches and a natural experiment with same-company different-speaker pairs. The authors find that zero-shot LLMs show higher sensitivity but a significant portion of the effect may be due to speaker idiolect rather than substantive stance.
Granuscore is a reference-free measure of granularity for text analysis and question answering. It uses hierarchical embedding spaces to capture fine-grained vs. coarse language and demonstrates consistent differences in model behavior across QA benchmarks.
This paper reveals the existence of hidden human-like spans in machine-generated texts and proposes a model-agnostic stacked enhancement framework that improves existing detectors by reducing the influence of these spans.
This paper proposes a framework for parallel chunk-level processing of long documents with LLMs to reduce cumulative bias and improve evidence traceability, achieving significant reductions in omission errors and unsupported claims.
This paper introduces a new embedding model designed to capture preferential similarity rather than just semantic similarity, improving preference prediction for collective decision-making systems.
Researchers use four-state Markov chains to model vowel/consonant patterns in Pushkin’s Evgenij Onegin and its Italian translation, revealing structural asymmetries and narrative-linked phonological cues.
Researchers from EPFL and Idiap apply NLP methods (topic modeling, sentiment analysis, readability scoring) to over 2000 hyper-local news articles to assess how well local French-language media serves migrant communities. The study combines focus groups with computational text analysis to identify gaps between local news content and migrant readers' needs.