Towards a Linguistic Evaluation of Narratives: A Quantitative Stylistic Framework
Summary
A preprint proposes a 33-feature quantitative linguistic framework that distinguishes professionally edited from self-published books and outperforms existing story-level evaluation metrics.
View Cached Full Text
Cached at: 04/22/26, 08:30 AM
# Towards a Linguistic Evaluation of Narratives: A Quantitative Stylistic Framework Source: [https://arxiv.org/abs/2604.19261](https://arxiv.org/abs/2604.19261) [View PDF](https://arxiv.org/pdf/2604.19261) > Abstract:The evaluation of narrative quality remains a complex challenge, as it involves subjective factors such as plot, character development, and emotional impact\. This work proposes a quantitative approach to narrative assessment by focusing on the linguistic dimension as a primary indicator of quality\. The paper presents a methodology for the automatic evaluation of narrative based on the extraction of a comprehensive set of 33 quantitative linguistic features categorized into lexical, syntactic, and semantic groups\. To test the model, an experiment was conducted on a specialized corpus of 23 books, including canonical masterpieces and self\-published works\. Through a similarity matrix, the system successfully clustered the narratives, distinguishing almost perfectly between professionally edited and self\-published texts\. Furthermore, the methodology was validated against a human\-annotated dataset; it significantly outperforms traditional story\-level evaluation metrics, demonstrating the effectiveness of quantitative linguistic features in assessing narrative quality\. ## Submission history From: Alessandro Maisto \[[view email](https://arxiv.org/show-email/2b9a5b8d/2604.19261)\] **\[v1\]**Tue, 21 Apr 2026 09:21:40 UTC \(827 KB\)
Similar Articles
BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories
Researchers introduce BIASEDTALES-ML, a large-scale multilingual dataset of ~350,000 LLM-generated children's stories across eight languages, designed to analyze narrative attribute distributions and cross-lingual bias patterns in language model outputs. The work reveals significant cross-lingual variability, highlighting limitations of English-centric bias evaluations.
Reward Modeling for Scientific Writing Evaluation
This paper proposes SciRM, cost-efficient open-source reward models tailored for evaluating scientific writing through a two-stage training framework that optimizes evaluation preferences and reasoning capabilities. The models generalize across diverse scientific writing tasks without requiring task-specific retraining, addressing limitations of existing LLM-based judges on domain-specific evaluation criteria.
Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models
Introduces a framework to quantify how LLMs overstate certainty through rhetorical devices, revealing model-agnostic patterns of epistemic-rhetorical miscalibration.
SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation
SwanNLP presents an LLM-based framework for plausibility scoring in narrative word sense disambiguation at SemEval-2026 Task 5, using structured reasoning and dynamic few-shot prompting to predict human-perceived plausibility of word senses in short stories. The work demonstrates that commercial large-parameter LLMs with few-shot prompting and model ensembling effectively replicate human judgment patterns in realistic narrative contexts.
From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text
A comprehensive dual-aspect evaluation framework for large language models on Vietnamese legal text simplification, combining quantitative benchmarking (Accuracy, Readability, Consistency) with qualitative error analysis across GPT-4o, Claude 3 Opus, Gemini 1.5 Pro, and Grok-1.