Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

arXiv cs.CL 04/20/26, 04:00 AM Papers

Summary

This paper presents SSAS (Syntactic & Semantic Context Assessment Summarization), a framework designed to improve consistency in LLM-based sentiment prediction by reducing noise and variance through hierarchical classification and iterative summarization. Empirical evaluation on three industry-standard datasets shows up to 30% improvement in data quality and reliability for enterprise decision-making.

arXiv:2604.15547v1 Announce Type: new Abstract: The fundamental challenge of using Large Language Models (LLMs) for reliable, enterprise-grade analytics, such as sentiment prediction, is the conflict between the LLMs' inherent stochasticity (generative, non-deterministic nature) and the analytical requirement for consistency. The LLM inconsistency, coupled with the noisy nature of chaotic modern datasets, renders sentiment predictions too volatile for strategic business decisions. To resolve this, we present a Syntactic & Semantic Context Assessment Summarization (SSAS) framework for establishing context. Context established by SSAS functions as a sophisticated data pre-processing framework that enforces a bounded attention mechanism on LLMs. It achieves this by applying a hierarchical classification structure (Themes, Stories, Clusters) and an iterative Summary-of-Summaries (SoS) based context computation architecture. This endows the raw text with high-signal, sentiment-dense prompts, that effectively mitigate both irrelevant data and analytical variance. We empirically evaluated the efficacy of SSAS, using Gemini 2.0 Flash Lite, against a direct-LLM approach across three industry-standard datasets - Amazon Product Reviews, Google Business Reviews, Goodreads Book Reviews - and multiple robustness scenarios. Our results show that our SSAS framework is capable of significantly improving data quality, up to 30%, through a combination of noise removal and improvement in the estimation of sentiment prediction. Ultimately, consistency in our context-estimation capabilities provides a stable and reliable evidence base for decision-making.

Original Article

View Cached Full Text

Cached at: 04/20/26, 08:27 AM

# Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

Source: https://arxiv.org/html/2604.15547

Nitin Mayande
Tellagence Inc.
†† thanks: {sharookh, nitin, shreeya}@tellagence.com

Shreeya Verma Kathuria
Tellagence Inc.
†† thanks: {sharookh, nitin, shreeya}@tellagence.com

Nitin Joglekar
Villanova School of Business, Villanova University
†† thanks: [email protected]

Charles Weber
Maseeh College of Engineering and Computer Science, Portland State University
†† thanks: [email protected]

## Abstract

The fundamental challenge of using Large Language Models (LLMs) for reliable, enterprise-grade analytics, such as sentiment prediction, is the conflict between the LLMs' inherent stochasticity (generative, non-deterministic nature) and the analytical requirement for consistency. LLM inconsistency, coupled with the noisy nature of chaotic modern datasets, renders sentiment predictions too volatile for strategic business decisions. To resolve this, we present a Syntactic & Semantic Context Assessment Summarization (SSAS) framework for establishing context. Context established by SSAS functions as a sophisticated data pre-processing framework that enforces a bounded attention mechanism on LLMs. It achieves this by applying a hierarchical classification structure (Themes, Stories, Clusters) and an iterative Summary-of-Summaries (SoS) based context computation architecture. This endows the raw text with high-signal, sentiment-dense prompts that effectively mitigate both irrelevant data and analytical variance.

We empirically evaluated the efficacy of SSAS using Gemini 2.0 Flash Lite against a direct-LLM approach across three industry-standard datasets – Amazon Product Reviews, Google Business Reviews, and Goodreads Book Reviews – under multiple robustness scenarios. Our results show that the SSAS framework can significantly improve data quality by up to 30% through a combination of noise removal and improved sentiment prediction estimation. Ultimately, consistency in our context-estimation capabilities provides a stable and reliable evidence base for decision-making.

**Keywords:** Natural Language Processing (NLP) · Artificial Intelligence (AI) · Sentiment Analysis

## 1 Introduction

In today's fast-paced business landscape, data-driven decision-making is a necessity. However, a significant gap exists between the volume of available textual data and the infrastructure required to extract actionable insights from it. While strategic decisions rely on rigorous data analysis, the sheer scale of non-trivial modern datasets creates a chaotic environment where signal is frequently buried under layers of technical friction. This challenge is further exacerbated by the integration of Large Language Models (LLMs), whose probabilistic architecture is fundamentally at odds with the repeatable and precise output requirements of enterprise-grade reporting. The conflict between operational velocity and analytical precision is defined by the following systemic pressures:

- **The Velocity Mandate:** Competitive advantage depends on the ability to process large-scale datasets quickly. Any latency in the pipeline results in decayed relevance.
- **The Signal-to-Noise Deficit:** Modern large-scale text processing data (e.g., social media marketing datasets) are fundamentally chaotic. While a fraction of the data focuses on the core problem, the vast majority consists of noisy, inconsequential information that complicates processing.
- **The Strategic Skill Gap:** Large-scale text-processing teams are domain experts (e.g., social media marketers are experts in brand strategy and consumer behavior). These experts should not be expected to possess the specialized engineering skills required to manipulate stochastic LLM outputs into reliable data.

This environment presents two primary hurdles: the Noise problem—the difficulty of identifying, isolating, and removing irrelevant data—and the Inconsistency problem, which is the inability to ensure that an LLM-based analytical process is consistent. While these hurdles remain, teams are left with analytics, such as sentiment predictions, that are too volatile to support enterprise-level decision-making. The genesis of these problems lies in utilizing tools (e.g., LLM models) that are currently designed for generative creativity rather than consistent data processing.

### 1.1 The Paradox of LLM Creativity: Why Generative AI Underperforms in Data Science

To bridge the gap between AI potential and large-scale text processing execution, we must address the fundamental conflict between LLM architectures and the requirements of data science. LLMs are, by design, engines of probability. While their underlying mechanics are revolutionary for creative synthesis, they are inherently poorly suited for the rigid, invariant requirements of data analytics. The root of this failure lies in the LLM's attention mechanism. In a standard generative configuration, the attention mechanism dictates which tokens the model prioritizes during processing. Because these models are optimized for novelty, the mechanism may assign different weights to the same input tokens across successive runs. This stochasticity is an asset for creative tasks, but it can be a significant liability for data science tasks where latent space stability is required to ensure data integrity. In sentiment prediction, this creative variance manifests as inconsistency. If the same dataset yields different sentiment scores upon re-run, the output is functionally useless for strategic planning. At a strategic level, this inconsistency is more than a technical glitch; it is an erosion of the evidentiary basis for decision-making. To achieve reliability, we must move from generative variance toward a consistent standard.

### 1.2 Requisite Capability: Consistency in Summarization and Sentiment Prediction

For AI-driven sentiment analysis to meet the threshold of enterprise-grade analytics, it must adhere to the benchmark of consistency. This is not a subjective measure of quality but a technical requirement for any system intended to serve as a stable foundation for corporate strategy. Consistency is defined as the ability to generate identical output when provided with the same input. In a professional analytical environment, an analysis performed today must be perfectly replicable tomorrow. Without this guarantee, data-driven insights are merely ephemeral snapshots, lacking the stability required for long-term strategic investments.

Achieving this standard necessitates a rigorous approach to identifying and isolating noise. Research suggests that the inclusion of irrelevant information within the input can be damaging to performance as it forces the model to attend to inconsequential patterns. This creates a signal-to-noise deficit that is particularly acute in modern marketing and commercial datasets, where the sheer volume of data often buries actionable insights under layers of technical friction. A consistent analytical framework must, therefore, be capable of (1) identifying the relevancy of data points within the broader dataset, (2) isolating core problem-related data from inconsequential information, and finally (3) removing noise to ensure attention focused exclusively on relevant context. By leveraging these capabilities, organizations can transform LLMs from creative assistants to precise analytical instruments for meeting the demands of large-scale text processing.

### 1.3 Hierarchical Contextual Framework for Analytical Integrity

Our SSAS framework posits a method to address the dual crises of noise and stochastic inconsistency by replacing the black box unpredictability of standard LLMs with a structured methodology designed to enforce integrity on chaotic datasets. The methodology is implemented through a specialized two-phase framework:

1. **Contextual Relevancy:** The process begins by evaluating the data within its specific context. By identifying information relevancy at a granular level, the system determines which data points are pertinent to the defined problem and which are extraneous.
2. **Noise Reduction and Reliability Improvement:** This is the critical point of interdependence. Using the derived context from Phase 1, we systematically reduce dataset noise. By feeding only the refined, relevant context into the LLM, we neutralize variance and significantly improve the consistency and reliability of the output.

Our framework refines raw, chaotic data into a reliable and analytically relevant dataset. In parallel, by narrowing the model's focus through derived context, this framework ensures that the refined input consistently yields the same results, addressing, to a large extent, the stochasticity problem inherent in generative architectures. This approach alleviates the technical burden on large-scale text processing teams, allowing them to focus on strategy rather than engineering methods. Our contributions in this paper are as follows:

1. We present our SSAS framework to provide LLMs with assisting context which, in turn, helps LLMs focus their attention mechanism and provide consistency to analytical tasks.
2. Our framework classifies datasets into a hierarchical structure of Themes, Stories, and Clusters to create consistent summaries and Summaries-of-Summaries (SoS) across multiple levels of data aggregations.
3. Our framework helps identify noisy data points within datasets and helps LLMs focus their attention on the signal within the datasets.
4. Our framework is able to significantly improve data quality, up to 30%, through a combination of noise removal and improvement in the estimation of sentiment prediction.

The rest of this paper is structured as follows: Section 2 presents related and background work around LLM mechanisms, hierarchical information, and semantic alignment. Section 3 presents our Syntactic & Semantic Context Assessment Summarization (SSAS) framework. Section 4 provides details on our Evaluation Framework, while Section 5 presents the results of our framework when compared against a Direct-LLM approach. We conclude in Section 6.

## 2 Related Works

The emergence of Large Language Models (LLMs) has fundamentally redefined text analysis, shifting the paradigm from supervised feature engineering toward zero-shot and few-shot learning. However, as these models move from creative synthesis to enterprise-grade analytics, their inherent instability presents significant challenges. Our work builds upon three primary areas of research: the sensitivity of in-context learning, the mechanics of attention-based noise, and hierarchical data summarization. Elements of Sections 2 and 3 are similar to Kathuria et al., because our respective works share and build on the SSAS framework originally laid out in Mayande et al. For ease of access, we have summarized these ideas in this paper, so that readers need not refer back to Kathuria et al.

### 2.1 In-Context Learning and Prompt Instability

The efficacy of Large Language Models (LLMs) in zero-shot and few-shot regimes is largely governed by the paradigm of In-Context Learning (ICL). However, despite their sophisticated semantic latent spaces, LLMs exhibit profound and notorious sensitivity to the specificities of the input context. Zhao et al. characterized this as prompt instability, demonstrating that stochastic variations – such as the permutation of few-shot examples or minor syntactic shifts in instruction templates – can induce significant fluctuations in classification accuracy. This volatility suggests that the standard attention mechanism often converges on surface-level patterns rather than underlying logical structures. Furthermore, the architectural constraints of the transformer's context window present a dimensional bottleneck. As noted by Dong et al., fixed token limits necessitate a zero-sum trade-off between the depth of individual examples and the breadth of the reference set. In enterprise analytics, where datasets are high-dimensional and noisy, this limitation often leads to recency bias or the inclusion of non-representative outliers that confound the model's outcomes.

The SSAS framework departs from traditional ICL by replacing static, heuristically-derived prompts with dynamically synthesized context. By applying a precision-filtering pipeline to the input background, we ensure that the hints provided to the model are mathematically optimized for representative signal. This transforms the context from a variable, human-engineered instruction into a stable, feature-engineered instrument, effectively addressing the inherent stochasticity of the generative process.

### 2.2 Attention Mechanisms and the Signal-to-Noise Challenge

The LLM Paradox identified in this study – wherein generative fluency inversely correlates with analytical precision – is fundamentally rooted in the transformer's attention mechanism.

Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

Similar Articles

Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics

Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics

SSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysis

SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

Submit Feedback

Similar Articles

Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics

Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics

SSP-based construction of evaluation-annotated data for fine-grained aspect-based sentiment analysis

SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring