text-analysis

#text-analysis

Assessing Post-Reform Changes in Risk Disclosure Quality with a Multidimensional Text Analysis Approach

arXiv cs.CL ↗ · 2026-06-26 Cached

This paper proposes a multidimensional text analysis approach combining Japanese NLP metrics and statistical methods to evaluate changes in risk disclosure quality, applied to Japan's 2019 corporate disclosure reforms. The analysis of 19,770 firm-year observations reveals complex shifts such as increased volume accompanied by decreased readability.

0 favorites 0 likes

#text-analysis

Three Buddhist Vocabularies: Computational Stylometry of the English Pali Canon across Sutta, Vinaya, and Abhidhamma

arXiv cs.CL ↗ · 2026-06-25 Cached

This paper applies computational stylometry to English translations of the Pali Canon, examining vocabulary differences across the Sutta, Vinaya, and Abhidhamma divisions.

0 favorites 0 likes

#text-analysis

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

arXiv cs.CL ↗ · 2026-06-15 Cached

Introduces Persuasion Index (PI), a theory-guided framework with 15 dimensions and 55 sub-features for analyzing persuasive rhetorical cues in text. The framework is modular, open-source, and evaluated on four datasets, providing interpretable feature spaces for persuasion-related outcomes.

0 favorites 0 likes

#text-analysis

A Geometric Profile of Semantic Information in Text: Frame-Conditional Uniqueness and a Trade-Off Triangle for Scalar Summaries

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper develops a geometric framework to measure semantic content of texts using sentence embeddings, proposing a three-coordinate semantic profile (novelty, breadth, integration) and a scalar trade-off triangle, validated across synthetic categories and novels.

0 favorites 0 likes

#text-analysis

Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research

arXiv cs.AI ↗ · 2026-06-04 Cached

This paper introduces PEEL (Protocols for Epistemically Engaged Literacy in AI), a framework combining deterministic text analysis via Voyant Tools with LLM interpretation via Claude, grounded in Peircean semiotics, to expose systematic distortions in AI-generated research summaries and promote epistemic accountability.

0 favorites 0 likes

#text-analysis

Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates

arXiv cs.CL ↗ · 2026-06-03 Cached

This paper introduces conditional hypothesis generation, a framework that incorporates researcher-specified covariates to steer LLM-based text analysis toward discovering meaningful subgroup differences while addressing confounds like stratum imbalance and sign reversal.

0 favorites 0 likes

#text-analysis

Slogans or Stance? A Label-Light Diagnostic for Entrepreneurial-Discourse Measurement on Chinese SOE Speeches

arXiv cs.CL ↗ · 2026-05-29 Cached

This paper proposes a label-light measurement diagnostic to evaluate whether popular text analysis methods (dictionaries, topic models, embeddings, LLMs) capture substantive stance versus symbolic rhetoric in entrepreneurial-discourse measurement, using a corpus of 80 Chinese SOE speeches and a natural experiment with same-company different-speaker pairs. The authors find that zero-shot LLMs show higher sensitivity but a significant portion of the effect may be due to speaker idiolect rather than substantive stance.

0 favorites 0 likes

#text-analysis

Granuscore: A Reference-Free Measure of Granularity for Text Analysis and Question Answering

arXiv cs.CL ↗ · 2026-05-27 Cached

Granuscore is a reference-free measure of granularity for text analysis and question answering. It uses hierarchical embedding spaces to capture fine-grained vs. coarse language and demonstrates consistent differences in model behavior across QA benchmarks.

0 favorites 0 likes

#text-analysis

Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper reveals the existence of hidden human-like spans in machine-generated texts and proposes a model-agnostic stacked enhancement framework that improves existing detectors by reducing the influence of these spans.

0 favorites 0 likes

#text-analysis

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper proposes a framework for parallel chunk-level processing of long documents with LLMs to reduce cumulative bias and improve evidence traceability, achieving significant reductions in omission errors and unsupported claims.

0 favorites 0 likes

#text-analysis

Embeddings for Preferences, Not Semantics

arXiv cs.AI ↗ · 2026-05-12 Cached

This paper introduces a new embedding model designed to capture preferential similarity rather than just semantic similarity, improving preference prediction for collective decision-making systems.

0 favorites 0 likes

#text-analysis

Markov reads Pushkin, again: A statistical journey into the poetic world of Evgenij Onegin

arXiv cs.CL ↗ · 2026-04-23 Cached

Researchers use four-state Markov chains to model vowel/consonant patterns in Pushkin’s Evgenij Onegin and its Italian translation, revealing structural asymmetries and narrative-linked phonological cues.

0 favorites 0 likes

#text-analysis

Migrant Voices, Local News: Insights on Bridging Community Needs with Media Content

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from EPFL and Idiap apply NLP methods (topic modeling, sentiment analysis, readability scoring) to over 2000 hyper-local news articles to assess how well local French-language media serves migrant communities. The study combines focus groups with computational text analysis to identify gaps between local news content and migrant readers' needs.

0 favorites 0 likes

text-analysis

Submit Feedback