Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)
Summary
This paper presents SSAS (Syntactic & Semantic Context Assessment Summarization), a framework designed to improve consistency in LLM-based sentiment prediction by reducing noise and variance through hierarchical classification and iterative summarization. Empirical evaluation on three industry-standard datasets shows up to 30% improvement in data quality and reliability for enterprise decision-making.
View Cached Full Text
Cached at: 04/20/26, 08:27 AM
# Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)
Source: https://arxiv.org/html/2604.15547
Nitin Mayande
Tellagence Inc.
†† thanks: {sharookh, nitin, shreeya}@tellagence.com
Shreeya Verma Kathuria
Tellagence Inc.
†† thanks: {sharookh, nitin, shreeya}@tellagence.com
Nitin Joglekar
Villanova School of Business, Villanova University
†† thanks: [email protected]
Charles Weber
Maseeh College of Engineering and Computer Science, Portland State University
†† thanks: [email protected]
## Abstract
The fundamental challenge of using Large Language Models (LLMs) for reliable, enterprise-grade analytics, such as sentiment prediction, is the conflict between the LLMs' inherent stochasticity (generative, non-deterministic nature) and the analytical requirement for consistency. LLM inconsistency, coupled with the noisy nature of chaotic modern datasets, renders sentiment predictions too volatile for strategic business decisions. To resolve this, we present a Syntactic & Semantic Context Assessment Summarization (SSAS) framework for establishing context. Context established by SSAS functions as a sophisticated data pre-processing framework that enforces a bounded attention mechanism on LLMs. It achieves this by applying a hierarchical classification structure (Themes, Stories, Clusters) and an iterative Summary-of-Summaries (SoS) based context computation architecture. This endows the raw text with high-signal, sentiment-dense prompts that effectively mitigate both irrelevant data and analytical variance.
We empirically evaluated the efficacy of SSAS using Gemini 2.0 Flash Lite against a direct-LLM approach across three industry-standard datasets – Amazon Product Reviews, Google Business Reviews, and Goodreads Book Reviews – under multiple robustness scenarios. Our results show that the SSAS framework can significantly improve data quality by up to 30% through a combination of noise removal and improved sentiment prediction estimation. Ultimately, consistency in our context-estimation capabilities provides a stable and reliable evidence base for decision-making.
**Keywords:** Natural Language Processing (NLP) · Artificial Intelligence (AI) · Sentiment Analysis
## 1 Introduction
In today's fast-paced business landscape, data-driven decision-making is a necessity. However, a significant gap exists between the volume of available textual data and the infrastructure required to extract actionable insights from it. While strategic decisions rely on rigorous data analysis, the sheer scale of non-trivial modern datasets creates a chaotic environment where signal is frequently buried under layers of technical friction. This challenge is further exacerbated by the integration of Large Language Models (LLMs), whose probabilistic architecture is fundamentally at odds with the repeatable and precise output requirements of enterprise-grade reporting. The conflict between operational velocity and analytical precision is defined by the following systemic pressures:
- **The Velocity Mandate:** Competitive advantage depends on the ability to process large-scale datasets quickly. Any latency in the pipeline results in decayed relevance.
- **The Signal-to-Noise Deficit:** Modern large-scale text processing data (e.g., social media marketing datasets) are fundamentally chaotic. While a fraction of the data focuses on the core problem, the vast majority consists of noisy, inconsequential information that complicates processing.
- **The Strategic Skill Gap:** Large-scale text-processing teams are domain experts (e.g., social media marketers are experts in brand strategy and consumer behavior). These experts should not be expected to possess the specialized engineering skills required to manipulate stochastic LLM outputs into reliable data.
This environment presents two primary hurdles: the Noise problem—the difficulty of identifying, isolating, and removing irrelevant data—and the Inconsistency problem, which is the inability to ensure that an LLM-based analytical process is consistent. While these hurdles remain, teams are left with analytics, such as sentiment predictions, that are too volatile to support enterprise-level decision-making. The genesis of these problems lies in utilizing tools (e.g., LLM models) that are currently designed for generative creativity rather than consistent data processing.
### 1.1 The Paradox of LLM Creativity: Why Generative AI Underperforms in Data Science
To bridge the gap between AI potential and large-scale text processing execution, we must address the fundamental conflict between LLM architectures and the requirements of data science. LLMs are, by design, engines of probability. While their underlying mechanics are revolutionary for creative synthesis, they are inherently poorly suited for the rigid, invariant requirements of data analytics. The root of this failure lies in the LLM's attention mechanism. In a standard generative configuration, the attention mechanism dictates which tokens the model prioritizes during processing. Because these models are optimized for novelty, the mechanism may assign different weights to the same input tokens across successive runs. This stochasticity is an asset for creative tasks, but it can be a significant liability for data science tasks where latent space stability is required to ensure data integrity. In sentiment prediction, this creative variance manifests as inconsistency. If the same dataset yields different sentiment scores upon re-run, the output is functionally useless for strategic planning. At a strategic level, this inconsistency is more than a technical glitch; it is an erosion of the evidentiary basis for decision-making. To achieve reliability, we must move from generative variance toward a consistent standard.
### 1.2 Requisite Capability: Consistency in Summarization and Sentiment Prediction
For AI-driven sentiment analysis to meet the threshold of enterprise-grade analytics, it must adhere to the benchmark of consistency. This is not a subjective measure of quality but a technical requirement for any system intended to serve as a stable foundation for corporate strategy. Consistency is defined as the ability to generate identical output when provided with the same input. In a professional analytical environment, an analysis performed today must be perfectly replicable tomorrow. Without this guarantee, data-driven insights are merely ephemeral snapshots, lacking the stability required for long-term strategic investments.
Achieving this standard necessitates a rigorous approach to identifying and isolating noise. Research suggests that the inclusion of irrelevant information within the input can be damaging to performance as it forces the model to attend to inconsequential patterns. This creates a signal-to-noise deficit that is particularly acute in modern marketing and commercial datasets, where the sheer volume of data often buries actionable insights under layers of technical friction. A consistent analytical framework must, therefore, be capable of (1) identifying the relevancy of data points within the broader dataset, (2) isolating core problem-related data from inconsequential information, and finally (3) removing noise to ensure attention focused exclusively on relevant context. By leveraging these capabilities, organizations can transform LLMs from creative assistants to precise analytical instruments for meeting the demands of large-scale text processing.
### 1.3 Hierarchical Contextual Framework for Analytical Integrity
Our SSAS framework posits a method to address the dual crises of noise and stochastic inconsistency by replacing the black box unpredictability of standard LLMs with a structured methodology designed to enforce integrity on chaotic datasets. The methodology is implemented through a specialized two-phase framework:
1. **Contextual Relevancy:** The process begins by evaluating the data within its specific context. By identifying information relevancy at a granular level, the system determines which data points are pertinent to the defined problem and which are extraneous.
2. **Noise Reduction and Reliability Improvement:** This is the critical point of interdependence. Using the derived context from Phase 1, we systematically reduce dataset noise. By feeding only the refined, relevant context into the LLM, we neutralize variance and significantly improve the consistency and reliability of the output.
Our framework refines raw, chaotic data into a reliable and analytically relevant dataset. In parallel, by narrowing the model's focus through derived context, this framework ensures that the refined input consistently yields the same results, addressing, to a large extent, the stochasticity problem inherent in generative architectures. This approach alleviates the technical burden on large-scale text processing teams, allowing them to focus on strategy rather than engineering methods. Our contributions in this paper are as follows:
1. We present our SSAS framework to provide LLMs with assisting context which, in turn, helps LLMs focus their attention mechanism and provide consistency to analytical tasks.
2. Our framework classifies datasets into a hierarchical structure of Themes, Stories, and Clusters to create consistent summaries and Summaries-of-Summaries (SoS) across multiple levels of data aggregations.
3. Our framework helps identify noisy data points within datasets and helps LLMs focus their attention on the signal within the datasets.
4. Our framework is able to significantly improve data quality, up to 30%, through a combination of noise removal and improvement in the estimation of sentiment prediction.
The rest of this paper is structured as follows: Section 2 presents related and background work around LLM mechanisms, hierarchical information, and semantic alignment. Section 3 presents our Syntactic & Semantic Context Assessment Summarization (SSAS) framework. Section 4 provides details on our Evaluation Framework, while Section 5 presents the results of our framework when compared against a Direct-LLM approach. We conclude in Section 6.
## 2 Related Works
The emergence of Large Language Models (LLMs) has fundamentally redefined text analysis, shifting the paradigm from supervised feature engineering toward zero-shot and few-shot learning. However, as these models move from creative synthesis to enterprise-grade analytics, their inherent instability presents significant challenges. Our work builds upon three primary areas of research: the sensitivity of in-context learning, the mechanics of attention-based noise, and hierarchical data summarization. Elements of Sections 2 and 3 are similar to Kathuria et al., because our respective works share and build on the SSAS framework originally laid out in Mayande et al. For ease of access, we have summarized these ideas in this paper, so that readers need not refer back to Kathuria et al.
### 2.1 In-Context Learning and Prompt Instability
The efficacy of Large Language Models (LLMs) in zero-shot and few-shot regimes is largely governed by the paradigm of In-Context Learning (ICL). However, despite their sophisticated semantic latent spaces, LLMs exhibit profound and notorious sensitivity to the specificities of the input context. Zhao et al. characterized this as prompt instability, demonstrating that stochastic variations – such as the permutation of few-shot examples or minor syntactic shifts in instruction templates – can induce significant fluctuations in classification accuracy. This volatility suggests that the standard attention mechanism often converges on surface-level patterns rather than underlying logical structures. Furthermore, the architectural constraints of the transformer's context window present a dimensional bottleneck. As noted by Dong et al., fixed token limits necessitate a zero-sum trade-off between the depth of individual examples and the breadth of the reference set. In enterprise analytics, where datasets are high-dimensional and noisy, this limitation often leads to recency bias or the inclusion of non-representative outliers that confound the model's outcomes.
The SSAS framework departs from traditional ICL by replacing static, heuristically-derived prompts with dynamically synthesized context. By applying a precision-filtering pipeline to the input background, we ensure that the hints provided to the model are mathematically optimized for representative signal. This transforms the context from a variable, human-engineered instruction into a stable, feature-engineered instrument, effectively addressing the inherent stochasticity of the generative process.
### 2.2 Attention Mechanisms and the Signal-to-Noise Challenge
The LLM Paradox identified in this study – wherein generative fluency inversely correlates with analytical precision – is fundamentally rooted in the transformer's attention mechanism.Similar Articles
Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics
This paper introduces Semantic State Abstraction Interfaces (SSAI) to separate representation hypotheses from optimization variance in LLM-augmented portfolio decisions. It concludes that SSAI's apparent advantage is largely a basket-selection effect, with dense encodings and principal components performing better empirically.
Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics
This paper introduces a method to improve factual consistency in text summarization by aggregating scores from multiple weak metrics via preference learning, achieving consistent factuality gains across various language models.
SSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysis
This paper presents the construction of a Korean evaluation-annotated corpus (EVAD) for fine-grained aspect-based sentiment analysis in e-commerce reviews using Semi-Automatic Symbolic Propagation. It evaluates KoBERT and KcBERT models on the dataset, achieving high F1 scores in aspect-value pair recognition.
SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction
Proposes SSDAU, a structured semantic data augmentation method for joint entity and relation extraction that preserves semantic structure by segmenting text based on entity labels and using BERTTopic for topic consistency, significantly outperforming existing augmentation methods.
Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring
Researchers from PNNL and Washington University introduce a systematic framework to test how five LLMs detect subtle semantic changes in documents, revealing positional bias, context coherence effects, and model-specific scoring fingerprints.