GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect-Based Sentiment Analysis

arXiv cs.CL 05/22/26, 04:00 AM Papers
Summary
Introduces GHI, a Graphormer-over-conditioned-hypergraph-incidence framework for aspect-based sentiment analysis that represents linguistic evidence as token–hyperedge incidence relations, achieving state-of-the-art results on six benchmarks with only 247M parameters.
arXiv:2605.22228v1 Announce Type: new Abstract: Aspect-based sentiment analysis (ABSA) requires models to bind sentiment evidence to the correct aspect, making it a natural testbed for fine-grained structural reasoning. We introduce GHI, a Graphormer-over-Conditioned-Hypergraph-Incidence framework that is designed as an incidence-based structural reasoning layer built on a bipartite topology. GHI represents diverse linguistic and semantic evidence as token--hyperedge incidence relations, allowing different structural signals to be incorporated through a unified interface. Extensive experiments on six standard ABSA benchmarks show that GHI outperforms all baselines on the SemEval domains, and multi-seed evaluations show stable improvements over strong DeBERTa. Further experiments show that with only 247M parameters, GHI approaches the performance of 11B Flan-T5 based methods on the ISE benchmark. Moreover, it demonstrates strong robustness on the challenging ARTS datasets, maintaining highly competitive performance where traditional models degrade. These results demonstrate that compact structural reasoning remains a valuable alternative to scale-driven approaches for fine-grained tasks.
Original Article
View Cached Full Text
Cached at: 05/22/26, 08:45 AM
# GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect-Based Sentiment Analysis
Source: [https://arxiv.org/html/2605.22228](https://arxiv.org/html/2605.22228)
Yu Du Qiqihar University 2025936317@qqhru\.edu\.cn &Wenlong Zhu Qiqihar University zwl\_qqhr@qqhru\.edu\.cn &Xingze Li Qiqihar University 2025936326@qqhru\.edu\.cn Chenglong Cao Qiqihar University 2025936311@qqhru\.edu\.cn &Jing Wang Qiqihar University 2025912341@qqhru\.edu\.cn &Yukun Ma Qiqihar University 2024935328@qqhru\.edu\.cn

###### Abstract

Aspect\-based sentiment analysis \(ABSA\) requires models to bind sentiment evidence to the correct aspect, making it a natural testbed for fine\-grained structural reasoning\. We introduceGHI, a Graphormer\-over\-Conditioned\-Hypergraph\-Incidence framework that is designed as an incidence\-based structural reasoning layer built on a bipartite topology\. GHI represents diverse linguistic and semantic evidence as token–hyperedge incidence relations, allowing different structural signals to be incorporated through a unified interface\. Extensive experiments on six standard ABSA benchmarks show thatGHIoutperforms all baselines on the SemEval domains, and multi\-seed evaluations show stable improvements over strong DeBERTa\. Further experiments show that with only 247M parameters,GHIapproaches the performance of 11B Flan\-T5 based methods on the ISE benchmark\. Moreover, it demonstrates strong robustness on the challenging ARTS datasets, maintaining highly competitive performance where traditional models degrade\. These results demonstrate that compact structural reasoning remains a valuable alternative to scale\-driven approaches for fine\-grained tasks\.

GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect\-Based Sentiment Analysis

Yu DuQiqihar University2025936317@qqhru\.edu\.cnWenlong ZhuQiqihar Universityzwl\_qqhr@qqhru\.edu\.cnXingze LiQiqihar University2025936326@qqhru\.edu\.cn

Chenglong CaoQiqihar University2025936311@qqhru\.edu\.cnJing WangQiqihar University2025912341@qqhru\.edu\.cnYukun MaQiqihar University2024935328@qqhru\.edu\.cn

## 1Introduction

Aspect\-Based Sentiment Analysis \(ABSA\) aims to predict the sentiment polarity toward a given aspect term or target entity\(Zhanget al\.,[2023](https://arxiv.org/html/2605.22228#bib.bib8)\)\. Unlike sentence\-level sentiment classification, ABSA is a fine\-grained evidence\-binding task that requires the model to separate different opinion clues within the same sentence\. For example, as shown in Figure[1](https://arxiv.org/html/2605.22228#S1.F1), the sentence expresses a positive sentiment "great" towards "GPU" while expressing a negative sentiment "expensive" towards "price", requiring the model to bind each opinion cue to correct aspect\.

![Refer to caption](https://arxiv.org/html/2605.22228v1/x1.png)Figure 1:An example sentence with two different aspects \(colored in blue\) and opinion evidences \(colored in red\)\. The sentence has already been preprocessed by its dependency parser\.More fundamentally, to further distinguish different evidence\-binding patterns, ABSA needs diverse complex structural representations from different directions\. In the local\-context direction,Zenget al\.\([2019](https://arxiv.org/html/2605.22228#bib.bib39)\)constrain attention to aspect\-centered windows, making the model focus on nearby opinion words\. Recent fusion\-based models further combine semantic attention with syntactic structure, for example,Jinet al\.\([2025](https://arxiv.org/html/2605.22228#bib.bib27)\)use Aspect\-NA and adaptive hierarchical cross\-attention to integrate semantic and dependency\-aware features\. Another direction focuses mainly on graph structures\. Specifically,Yin and Zhong \([2024](https://arxiv.org/html/2605.22228#bib.bib10)\)couple graph\-view message passing with sequence\-view Transformer modeling, jointly capturing syntactic connectivity and semantic interactions\.

Viewed across existing methods, ABSA methods that introduce high\-order relations or inject complex structures can be seen as helping the model identify and organize the aspect\-relevant evidence\. Moreover, most of them design each source of evidence in isolated views\. This observation motivates a general question: can ABSA benefit from a common structural framework through which diverse evidence can be represented, extended, and reasoned over?

To make this picture clear, we need a structure that possesses both the capability for multiple representations and scalability\. For this reason, hypergraphs provide a natural candidate\(Fenget al\.,[2019](https://arxiv.org/html/2605.22228#bib.bib43)\)\. As illustrated in Figure[2](https://arxiv.org/html/2605.22228#S1.F2), their hyperedge design enables heterogeneous groups of tokens to be connected under a shared evidence unit, and the structural properties inherited from graphs make them extensible to new sources\. Meanwhile, recent studies attempt to construct word\-level relational hypergraphs to model high\-order relations for ABSA, further showing their potential for high\-order reasoning\(Ouyanget al\.,[2024a](https://arxiv.org/html/2605.22228#bib.bib11); Juet al\.,[2025](https://arxiv.org/html/2605.22228#bib.bib12); Kashyapet al\.,[2025](https://arxiv.org/html/2605.22228#bib.bib13)\)\.

![Refer to caption](https://arxiv.org/html/2605.22228v1/x2.png)Figure 2:A hypergraph view for the aspect "price"\. Each colored ellipse denotes a hyperedge that connects multiple tokens as one evidence unit\.However, to model such a complex structure, there remains a lack of effective ways to integrate multi\-level information\. Another straightforward challenge is that it is typically difficult to modify a graph structure when attempting to incorporate new knowledge\. Fortunately, previous works have demonstrated the potential of both global attention and dynamic scalability on complex graph structures\. For instance,Yinget al\.\([2021](https://arxiv.org/html/2605.22228#bib.bib14)\)proposed Graphormer, which utilizes the shortest path distance \(SPD\) between nodes to enable global attention for complex graph structure\. In parallel, within the computer vision domain,Leiet al\.\([2025](https://arxiv.org/html/2605.22228#bib.bib15)\)introduced the concept of HyperACE\. This work demonstrates how adaptive hyperedge mechanisms operate in practice, offering fresh insights into the scalability of hypergraphs\.

Building upon these advancements, we proposeGHI, a Graphormer\-over\-Hypergraph\-Incidence framework for ABSA\. GHI expresses multiple evidence as token–hyperedge incidence relations, so that different structural signals can be incorporated through a unified interface while keeping the downstream reasoning layer unchanged\. In our ABSA instantiation,GHIuses a small set of canonical ABSA priors, including aspect spans, aspect\-relative local regions, and dependency neighborhoods, and complements them with context\-conditioned adaptive hyperedges for sample\-specific latent evidence\. In addition, by lifting hyperedges into explicit nodes,GHIperforms Graphormer\-style reasoning over a bipartite star\-expanded token–hyperedge graph\.

In summary, the main contributions of our work are as follows:

- •We proposeGHI, an incidence\-based structural reasoning framework for ABSA\.GHIrepresents linguistic and semantic evidence as token–hyperedge incidence relations, providing a unified interface that can naturally accommodate different structural signals without source\-specific reasoning branches\.
- •We introduce a bipartite star\-expanded Graphormer built on a static\-adaptive hypergraph design\. By lifting diverse evidence into explicit reasoning nodes,GHIturns token–hyperedge incidence relations into a bipartite topology and applies Graphormer\-style global attention over it\.
- •We conduct comprehensive evaluations on standard ABSA benchmarks, implicit sentiment evaluation \(ISE\), and adversarial robustness tests \(ARTS\)\. Results show thatGHIoutperforms strong baselines while exhibiting robust performance against complex linguistic variations\.

## 2Methodology

### 2\.1Overview

![Refer to caption](https://arxiv.org/html/2605.22228v1/x3.png)Figure 3:The overall architecture of the proposedGHIframework\. Static ABSA priors, including aspect \(red\), SRD/local\-context \(blue\), and dependency \(green\) hyperedges, are color\-coded by source and represented together with adaptive hyperedges in a token–hyperedge incidence structure\. Dotted lines denote soft incidence weights used for token–hyperedge propagation, while orange Top\-KKlinks define the sparse hard incidence topology\. The purple bidirectional arrows indicate Graphormer\-style global attention over the entire bipartite token–hyperedge graph\.We proposeGHI, aGraphormer\-over\-ConditionedHypergraphIncidence framework for ABSA\.GHIlifts structural priors and adaptive semantic clusters into explicit nodes, thereby forming a bipartite star\-expanded graph for global Graphormer attention \(indicated by the purple arrows\)\. As illustrated in Figure[3](https://arxiv.org/html/2605.22228#S2.F3), the framework operates as a unified incidence\-level routing layer, where distinct colors differentiate the diverse evidence sources\.

Specifically,GHIconstructs static hyperedges for deterministic linguistic evidence and adaptive hyperedges conditioned on contextual anchors\. With both soft and hard Top\-KKincidence views, the soft incidence supports differentiable token–hyperedge propagation, while the hard Top\-KKincidence instantiates a sparse star\-expanded topology for Graphormer reasoning with structural biases\.

### 2\.2Task Formulation and Encoding

Given an aspect spana=\[l,r\)a=\[l,r\)within a sequencex=\[x1,…,xN\]x=\[x\_\{1\},\\ldots,x\_\{N\}\], and a sentiment labely∈𝒴y\\in\\mathcal\{Y\}, ABSA aims to predict the sentiment polarity expressed toward the given aspect\. For pre\-trained encoders, the sentence–aspect pair is used as the input, while only the contextual states of the original sentence tokens are retained asH0H^\{0\}for graph reasoning\. We encode the sentence\-aspect pair to obtain contextual representations:

H0=\[h10,…,hN0\]=Enc\(x,a\),H^\{0\}=\[h\_\{1\}^\{0\},\\ldots,h\_\{N\}^\{0\}\]=\\mathrm\{Enc\}\(x,a\),\(1\)
whereH0∈ℝN×dH^\{0\}\\in\\mathbb\{R\}^\{N\\times d\}, and the half\-open span\[l,r\)\[l,r\)denotes the target aspect tokens fromxlx\_\{l\}toxr−1x\_\{r\-1\}\.

Graph reasoning operates solely on valid sentence tokens\. With a binary maskm∈\{0,1\}Nm\\in\\\{0,1\\\}^\{N\}, the sentence anchor is initialized asc0=Poolm\(H0\)c^\{0\}=\\mathrm\{Pool\}\_\{m\}\(H^\{0\}\)with mean poolingPoolm\(⋅\)\\mathrm\{Pool\}\_\{m\}\(\\cdot\)over valid tokens, and the aspect anchor is initialized asa0=Pool\[l,r\)\(H0\)a^\{0\}=\\mathrm\{Pool\}\_\{\[l,r\)\}\(H^\{0\}\)with mean poolingPool\[l,r\)\(⋅\)\\mathrm\{Pool\}\_\{\[l,r\)\}\(\\cdot\)over the target aspect span\.

### 2\.3Conditioned Hypergraph Incidence

The core ofGHIis a conditioned hypergraph incidence representation\. At layerℓ\\ell, we define a hypergraph𝒢ℓ=\(V,ℰℓ,Iℓ\)\\mathcal\{G\}^\{\\ell\}=\(V,\\mathcal\{E\}^\{\\ell\},I^\{\\ell\}\)over the current token statesHℓH^\{\\ell\}, whereVVis the set of token nodes,ℰℓ\\mathcal\{E\}^\{\\ell\}is the set of hyperedges, andIℓ∈ℝ\|V\|×\|ℰℓ\|I^\{\\ell\}\\in\\mathbb\{R\}^\{\\lvert V\\rvert\\times\\lvert\\mathcal\{E\}^\{\\ell\}\\rvert\}is the token–hyperedge incidence matrix\.GHIcombines task\-informed static hyperedges with context\-conditioned adaptive hyperedges\. The sentence and aspect anchors are maintained as layer\-wise memories, after each reasoning layerℓ\\ell, they are updated by gated MLPs ascℓ\+1,aℓ\+1=AnchorUpdate\(cℓ,aℓ,Hℓ\+1\)c^\{\\ell\+1\},a^\{\\ell\+1\}=\\mathrm\{AnchorUpdate\}\(c^\{\\ell\},a^\{\\ell\},H^\{\\ell\+1\}\), whereAnchorUpdate\(⋅\)\\mathrm\{AnchorUpdate\}\(\\cdot\)denotes gated MLP updates over the aspect\-level pooled regions\.

#### Static Hyperedges

We first construct a binary incidence matrixIstaI^\{\\mathrm\{sta\}\}from three static hyperedge priors\. The aspect hyperedgeeaspe\_\{\\mathrm\{asp\}\}connects all tokens inside the target span:easp=\{i∣l≤i<r\}e\_\{\\mathrm\{asp\}\}=\\\{i\\mid l\\leq i<r\\\}\. To encode aspect\-centered local context, we followZenget al\.\([2019](https://arxiv.org/html/2605.22228#bib.bib39)\)and compute the semantic\-relative distance \(SRD\)di=max\(0,\|i−l\+r−12\|−⌊r−l2⌋\)d\_\{i\}=\\mathrm\{max\}\\big\(0,\\big\\lvert i\-\\frac\{l\+r\-1\}\{2\}\\big\\rvert\-\\big\\lfloor\\frac\{r\-l\}\{2\}\\big\\rfloor\\big\)\. This prior is then instantiated as a local\-context hyperedgeesrde\_\{\\mathrm\{srd\}\}that connects tokens within a radiusρ\\rho:esrd=\{i∣mi=1,di≤ρ\}e\_\{\\mathrm\{srd\}\}=\\\{i\\mid m\_\{i\}=1,d\_\{i\}\\leq\\rho\\\}\. Then, the dependency hyperedgeedepe\_\{\\mathrm\{dep\}\}collects tokens reachable from aspect tokens withinTThops on the dependency graph:edep=\{i∣mi=1,distdep\(i,easp\)≤T\}e\_\{\\mathrm\{dep\}\}=\\\{i\\mid m\_\{i\}=1,\\mathrm\{dist\}\_\{\\mathrm\{dep\}\}\(i,e\_\{\\mathrm\{asp\}\}\)\\leq T\\\}\. Word\-level dependency edges are projected to subwords to align with encoder outputs\. Taken together, these priors form a static incidence matrixIsta∈ℝ\|V\|×SI^\{\\mathrm\{sta\}\}\\in\\mathbb\{R\}^\{\|V\|\\times S\}, whereSSdenotes the number of static hyperedges\.

#### Adaptive Hyperedges

Static hyperedges provide reliable task priors, but they cannot cover all sample\-specific opinion patterns\. To complement them,GHIinduces a small set of adaptive hyperedges at each layer, conditioned on the locally refined token states and current anchor memoriescℓc^\{\\ell\}andaℓa^\{\\ell\}\. As illustrated in the upper\-right part of Figure[3](https://arxiv.org/html/2605.22228#S2.F3), the adaptive hyperedge prototypes generateMMadaptive hyperedges, and further form a soft token–hyperedge incidence matrixIadℓ∈ℝ\|V\|×MI^\{\\ell\}\_\{\\mathrm\{ad\}\}\\in\\mathbb\{R\}^\{\\lvert V\\rvert\\times M\}:

Iadℓ=AdaptiveIncidence\(H~ℓ,cℓ,aℓ\),I^\{\\ell\}\_\{\\mathrm\{ad\}\}=\\mathrm\{AdaptiveIncidence\}\(\\widetilde\{H\}^\{\\ell\},c^\{\\ell\},a^\{\\ell\}\),\(2\)
whereAdaptiveIncidence\(⋅\)\\mathrm\{AdaptiveIncidence\}\(\\cdot\)denotes the adaptive incidence generator\. Its prototype\-based parameterization is given in Appendix[6](https://arxiv.org/html/2605.22228#A2.F6)\.

The adaptive incidence is concatenated with static incidence matrix to support differentiable token–hyperedge propagation, generally forming a soft incidence matrixIsoftℓ∈ℝ\|V\|×\(S\+M\)=\[Ista,Iadℓ\]I\_\{\\mathrm\{soft\}\}^\{\\ell\}\\in\\mathbb\{R\}^\{\\lvert V\\rvert\\times\(S\+M\)\}=\[I^\{\\mathrm\{sta\}\},I\_\{\\mathrm\{ad\}\}^\{\\ell\}\]\. Meanwhile, we retain the Top\-KKtokens for each adaptive hyperedge to obtain a sparse hard incidence matrixIhardℓ∈ℝ\|V\|×\(S\+M\)=\[Ista,TopK\(Iadℓ\)\]I\_\{\\mathrm\{hard\}\}^\{\\ell\}\\in\\mathbb\{R\}^\{\\lvert V\\rvert\\times\(S\+M\)\}=\[I^\{\\mathrm\{sta\}\},\\mathrm\{TopK\}\(\{I\}\_\{\\mathrm\{ad\}\}^\{\\ell\}\)\], which instantiates the bipartite star\-expanded topology used by Graphormer reasoning\.

### 2\.4GHI Reasoning Layer

GHI stacksLLreasoning layers over the conditioned incidence structure\. A layer\-wise computation flow is provided in Appendix[B\.2](https://arxiv.org/html/2605.22228#A2.SS2)\. By treating both tokens and hyperedges as explicit reasoning nodes, each layer couples two complementary views of token–hyperedge reasoning: the soft incidence view supports differentiable propagation over graded token–hyperedge memberships, while the hard incidence view instantiates a sparse token–hyperedge bipartite topology\. This star\-expanded graph allows Graphormer\-style attention to model token–token, token–hyperedge, and hyperedge–hyperedge interactions within a shared structural space\.

#### Local Context Refinement

Before constructing adaptive incidence,GHIapplies a local\-window self\-attention to refine short\-range token interactions\. Given the graph\-visible maskmmand window sizeww,LocalAttn\(⋅\)\\mathrm\{LocalAttn\}\(\\cdot\)restricts multi\-head self\-attention to token pairs\(i,j\)\(i,j\)satisfyingmi=mj=1m\_\{i\}=m\_\{j\}=1and\|i−j\|≤w\\lvert i\-j\\rvert\\leq w\. The local refinement is written as:H~ℓ=Hℓ\+LocalAttn\(LN\(Hℓ\),m,w\)\\widetilde\{H\}^\{\\ell\}=H^\{\\ell\}\+\\mathrm\{LocalAttn\(\\mathrm\{LN\}\}\(H^\{\\ell\}\),m,w\)\. The locally refined statesH~ℓ\\widetilde\{H\}^\{\\ell\}then guide the adaptive hyperedge incidence described in Section[2\.3](https://arxiv.org/html/2605.22228#S2.SS3.SSS0.Px2)\.

#### Incidence\-Aware Hypergraph Reasoning

Given the soft incidence matrixIsoftℓI\_\{\\mathrm\{soft\}\}^\{\\ell\}, GHI summarizes token information into hyperedge states through incidence\-weighted pooling:

Zℓ=EdgePool\(H~ℓ,Isoftℓ\)\.Z^\{\\ell\}=\\mathrm\{EdgePool\}\(\\widetilde\{H\}^\{\\ell\},I\_\{\\mathrm\{soft\}\}^\{\\ell\}\)\.\(3\)
Here,EdgePool\(⋅\)\\mathrm\{EdgePool\}\(\\cdot\)aggregates token states according to their soft token–hyperedge participation weights and adaptive priors\.

We denote the resulting incidence\-aware local token representation asHlocℓH^\{\\ell\}\_\{\\mathrm\{loc\}\}\. It is obtained by two complementary operations\.HGRefine\(⋅\)\\mathrm\{HGRefine\}\(\\cdot\)first performs soft token–hyperedge propagation and writes hyperedge messages back to tokens, while Relation\-Aware Incidence AttentionIncAttn\(⋅\)\\mathrm\{IncAttn\}\(\\cdot\)applies token–hyperedge attention using incidence\-level relation featuresΦℓ\\Phi^\{\\ell\}:

Hlocℓ=H~ℓ\+HGRefine\(H~ℓ,Zℓ,Isoftℓ\)\\displaystyle H\_\{\\mathrm\{loc\}\}^\{\\ell\}=\\widetilde\{H\}^\{\\ell\}\+\\mathrm\{HGRefine\}\(\\widetilde\{H\}^\{\\ell\},Z^\{\\ell\},I\_\{\\mathrm\{soft\}\}^\{\\ell\}\)\(4\)\+IncAttn\(H~ℓ,Zℓ,Isoftℓ,Φℓ\)\\displaystyle\+\\mathrm\{IncAttn\}\(\\widetilde\{H\}^\{\\ell\},Z^\{\\ell\},I\_\{\\mathrm\{soft\}\}^\{\\ell\},\\Phi^\{\\ell\}\)
Specifically,Φℓ\\Phi^\{\\ell\}includes edge type, edge role \(static or adaptive\), incidence weight, and SRD\.

#### Star\-Expanded Graphormer Reasoning

While the soft incidence view supports differentiable hypergraph propagation,GHIalso uses the hard incidence viewIhardℓI\_\{\\mathrm\{hard\}\}^\{\\ell\}to construct a sparse star\-expanded graph\. The expanded graph contains both token nodes and hyperedge nodes\. A token node is connected to a hyperedge node if the corresponding hard incidence entry is non\-zero\. The node states of the star\-expanded graph are initialized asXℓ=\[H~ℓ;Zℓ\]X^\{\\ell\}=\[\\widetilde\{H\}^\{\\ell\};Z^\{\\ell\}\]\.

Following the Graphormer design\(Yinget al\.,[2021](https://arxiv.org/html/2605.22228#bib.bib14)\), we extract structural encodings from the expanded graph, including topological connectivity and pairwise relation types, and inject them as Graphormer structural biases for multi\-head self\-attention\. In this way, global attention is aware of token–token, token–hyperedge, and hyperedge–hyperedge relations within the bipartite topology\. The global update is written compactly with layer normalization \(LN\):

Xglobℓ=Xℓ\+GraphormerAttn\(LN\(Xℓ\),ℬℓ\),X\_\{\\mathrm\{glob\}\}^\{\\ell\}=X^\{\\ell\}\+\\mathrm\{GraphormerAttn\}\(\\mathrm\{LN\}\(X^\{\\ell\}\),\\mathcal\{B\}^\{\\ell\}\),\(5\)
whereℬℓ\\mathcal\{B\}^\{\\ell\}denotes the Graphormer structural biases derived from the hard incidence topologyIhardℓI\_\{\\mathrm\{hard\}\}^\{\\ell\}\. We then split the output back into token and hyperedge parts:

Xglobℓ=\[Hglobℓ;Zℓ\+1\],X\_\{\\mathrm\{glob\}\}^\{\\ell\}=\[H\_\{\\mathrm\{glob\}\}^\{\\ell\};Z^\{\\ell\+1\}\],\(6\)
whereHglobℓH\_\{\\mathrm\{glob\}\}^\{\\ell\}denotes the global token representation produced by the star\-expanded Graphormer\.

#### Local\-Global Fusion

GHIfuses the local evidenceHlocℓ\{H\}\_\{\\mathrm\{loc\}\}^\{\\ell\}and the star\-expanded global token representation conditioned on the sentence and aspect anchors:

Uℓ=Fuse\(Hglobℓ,Hlocℓ,cℓ,aℓ\)\.U^\{\\ell\}=\\mathrm\{Fuse\}\(H\_\{\\mathrm\{glob\}\}^\{\\ell\},\{H\}\_\{\\mathrm\{loc\}\}^\{\\ell\},c^\{\\ell\},a^\{\\ell\}\)\.\(7\)
The next\-layer token states are then produced by a residual feed\-forward update:Hℓ\+1=Uℓ\+FFN\(LN\(Uℓ\)\)H^\{\\ell\+1\}=U^\{\\ell\}\+\\mathrm\{FFN\}\(\\mathrm\{LN\}\(U^\{\\ell\}\)\)\.

### 2\.5Prediction and Training

AfterLLreasoning layers,GHIpredicts sentiment polarity from local readout derived fromHlocℓH^\{\\ell\}\_\{loc\}and global readout derived fromHglobℓH^\{\\ell\}\_\{glob\}\. Two readouts are concatenated and passed to a linear classifier to obtainp\(y∣x,a\)p\(y\\mid x,a\)\. Notably, in our main configuration, the anchor memory is used for conditioning incidence induction and local\-global fusionUℓU^\{\\ell\}, but is not included in the final readout\. The entire training is with the standard cross\-entropy loss:

ℒ=−log⁡p\(y∣x,a\),\\mathcal\{L\}=\-\\log p\(y\\mid x,a\),\(8\)

## 3Experiments

We conduct experiments under a unified evaluation protocol\. Unless otherwise specified, we use the standard train / test splits and report Accuracy and Macro\-F1\. We also conduct hyperparameter sensitivity analyses in Appendix[C](https://arxiv.org/html/2605.22228#A3)\. All experimental details are provided in Appendix[A](https://arxiv.org/html/2605.22228#A1)\.

### 3\.1Main Results

Table 1:Overall performance on six ABSA benchmark datasets\. Best results are in bold and second\-best are underlined\.†\\daggerindicates result retrieved fromDaiet al\.\([2021](https://arxiv.org/html/2605.22228#bib.bib29)\), while others are from their original papers\.‡\\ddaggerdenotes that AGCL\(Jianet al\.,[2025](https://arxiv.org/html/2605.22228#bib.bib33)\)utilizes a frozen DeBERTa\-Large encoder rather than the base version\.Table[1](https://arxiv.org/html/2605.22228#S3.T1)reports the overall performance ofGHIalongside previous state\-of\-the\-art models across six benchmark datasets\.GHIparticularly outperforms all baselines on the SemEval domains\. Driven by the DeBERTa encoder\(Heet al\.,[2020](https://arxiv.org/html/2605.22228#bib.bib45)\), the proposed framework yields 90\.97% / 86\.40% of Accuracy / Macro\-F1 on Restaurant14 and 86\.08% / 83\.74% on Laptop\. Notably,GHIoutperforms knowledge\-augmented baselines such as AGCL\(Jianet al\.,[2025](https://arxiv.org/html/2605.22228#bib.bib33)\), suggesting that incidence\-based structural modeling can also provide gains complementary to external augmentation\. Furthermore,GHImaintains competitive performance on multi\-domain datasets, including Twitter and the challenging MAMS benchmark\. Overall,GHIshows strong gains on the SemEval domains and remains comparable to strong baselines on Twitter and MAMS\.

### 3\.2Controlled Encoder Comparison

Table 2:Controlled comparison under unified settings on SemEval\-2014 domains\. All results are reported as mean \(standard deviation\) over 5 seeds\. Models above are all implemented by us\.Previous studies have highlighted reproducibility issues and evaluation instability in ABSA evaluation, showing that random seeds, encoder capacities, and implementation details can noticeably affect reported performance\(Daiet al\.,[2021](https://arxiv.org/html/2605.22228#bib.bib29); Mukherjeeet al\.,[2021](https://arxiv.org/html/2605.22228#bib.bib41); Yanget al\.,[2023](https://arxiv.org/html/2605.22228#bib.bib42)\)\. To address these concerns, we evaluateGHIalongside representative structural models under identical BERT\-base and DeBERTa\-base encoders across multiple random seeds, aiming to separate structural gains from encoder capacity and training variance\.

Table[2](https://arxiv.org/html/2605.22228#S3.T2)presents the controlled evaluation results\. We strictly adhere to the hyperparameter configurations recommended in the original papers, and use the multi\-seed setting to observe relative improvements under the same encoder constraints\. Under these conditions,GHIyields stable and substantial gains\. Specifically, it improves Accuracy / Macro\-F1 by 1\.00% / 1\.91% on Restaurant14 and 1\.39% / 1\.85% on Laptop relative to the strong vanilla DeBERTa baseline\. Furthermore,GHImaintains competitive performance when compared to other state\-of\-the\-art models under the same encoder constraints\. These results verify that the gains ofGHIare not merely attributable to encoder capacity or training variance, but are consistent with the benefits of its incidence\-based reasoning design\.

### 3\.3Ablations

Table 3:Ablation results\. The variants remove adaptive hyperedges, explicit hyperedge nodes, Relation\-Aware Incidence Attention in Eq\.[4](https://arxiv.org/html/2605.22228#S2.E4), or Graphormer structural biasesℬℓ\\mathcal\{B\}^\{\\ell\}in Eq\.[5](https://arxiv.org/html/2605.22228#S2.E5), respectively\.Table[3](https://arxiv.org/html/2605.22228#S3.T3)presents the ablation results ofGHI\. On the SemEval\-2014 domains, w/o Incidence Attention or w/o Hyperedge Nodes yield the most severe drops\. Specifically, w/o Incidence Attention causes the largest accuracy decrease on both datasets, and reduces Macro\-F1 by 2\.68% on Restaurant14\. In parallel, w/o Hyperedge Nodes proves most harmful to Laptop, reducing its Macro\-F1 by 3\.35%\. These results suggest that token–hyperedge relations should not be treated merely as transient aggregation paths\. Incidence Attention provides relation\-aware token–hyperedge routing, allowing the model to distinguish deterministic syntactic links from softer semantic memberships\. Meanwhile, the drop caused by removing Hyperedge Nodes supports the motivation of lifting structural priors into explicit reasoning nodes, where token–hyperedge and hyperedge–hyperedge interactions can be modeled directly\.

The degradation pattern shifts significantly on the Twitter dataset\. Unlike formal SemEval domains, Twitter texts are typically shorter, highly informal, and syntactically noisy, which severely diminishes the reliability of parser\-derived fixed dependency priors\. In this setting, removing Adaptive Hyperedges incurs the largest performance penalty\. This suggests that sample\-specific semantic clusters, dynamically induced from token and anchor states, become the decisive factor when explicit syntactic regularities are weak\.

### 3\.4Implicit Sentiment Evaluation

Table 4:ISE Macro\-F1 results\.♠\\spadesuitdenotes baseline results implemented by us\.Table[4](https://arxiv.org/html/2605.22228#S3.T4)evaluatesGHIon the Implicit Sentiment Eval \(ISE\) benchmark\(Liet al\.,[2021b](https://arxiv.org/html/2605.22228#bib.bib34)\), a notoriously difficult setting where sentences lack explicit opinion words \(e\.g\., "The battery lasts only 2 hours\."\)\. With only 247M parameters, the DeBERTa\-basedGHIyields an F1 score of 79\.64% on Restaurant14 and 81\.96% on Laptop, closely approaching the 11B\-parameter Flan\-T5\+THOR\(Feiet al\.,[2023](https://arxiv.org/html/2605.22228#bib.bib36)\), a method specifically prompted for multi\-step chain\-of\-thought reasoning\. Furthermore,GHIoutperforms other heavy\-weight baselines, including GPT3\+THOR and standard Flan\-T5 prompting\. These results suggest that the proposed incidence\-based reasoning provides competitive implicit\-sentiment modeling without relying on large prompted language models\.

### 3\.5Aspect Robustness Test

Table 5:Aspect Robustness Test results\. Scores by model with∗\*are copied fromYang and Li \([2024](https://arxiv.org/html/2605.22228#bib.bib38)\)\.To evaluate the robustness ofGHIin the face of textual adversarial attacks, we employ existing adversarial attack datasets, specifically Laptop14\-ARTS and Restaurant14\-ARTS\(Xinget al\.,[2020](https://arxiv.org/html/2605.22228#bib.bib40)\)\. Table[5](https://arxiv.org/html/2605.22228#S3.T5)presents results on the ARTS benchmark\.GHIachieves the best Macro\-F1 on two datasets, improving over strong LSAE\(Yang and Li,[2024](https://arxiv.org/html/2605.22228#bib.bib38)\)by 3\.32% and 4\.33%, respectively, underscoring the robustness ofGHIunder challenging settings\.

### 3\.6Analysis and Visualization

Figure[4](https://arxiv.org/html/2605.22228#S4.F4)summarizes the relative positions of tokens retained by adaptive Top\-KKincidence\. The four bins denote tokens inside the aspect span, within 1–2 tokens, 3–5 tokens, and beyond 5 tokens, respectively\. The retained tokens are mainly concentrated around the aspect span and its near context, while Laptop keeps more middle and long range tokens than Restaurant14\. This result indicates that adaptive hyperedges select different evidence ranges across datasets under the same adaptive topology\.

Figure[5](https://arxiv.org/html/2605.22228#S4.F5)presents two illustrative cases to show howGHIorganizes aspect\-relevant evidence\. Orange boxes indicate the hard Top\-KKtokens retained for the sparse adaptive topology\. In both examples, GHI assigns higher weights to informative evidence regions\. Notably, in the first case, the adaptive incidence for "food" assigns high weights to distant opinion evidence such as "simple" and "satisfying", suggesting that adaptive hyperedges can group aspect\-relevant evidence across a broader context through learned incidence patterns\. Furthermore, in the second case, for aspect "Startup times",GHIdoes not rely on the explicit modifier "long", but assigns stronger weights to the concrete temporal evidence "two minutes"\. This indicates that the learned incidence can capture multi\-token and even implicit evidence expressions toward the target aspect\.

## 4Related Work

![Refer to caption](https://arxiv.org/html/2605.22228v1/x4.png)Figure 4:Distance distribution of adaptive Top\-KKtokens relative to aspect spans on SemEval\-14 domains\. Error bars denote 95% bootstrap confidence intervals\.![Refer to caption](https://arxiv.org/html/2605.22228v1/x5.png)Figure 5:Visualization examples in two casesStructural modeling remains important for ABSA, where models must bind sentiment evidence to the correct aspect\. Recent methods refine syntactic, semantic, and aspect\-specific structures from different perspectives\.

Early efforts in this direction, such asZenget al\.\([2019](https://arxiv.org/html/2605.22228#bib.bib39)\), pioneered local attention approaches via LCF mechanism\. Building upon this, recognizing that sentiment reasoning cannot only rely on a narrow local window,Jinet al\.\([2025](https://arxiv.org/html/2605.22228#bib.bib27)\)expands this mechanism by fusing semantic and syntactic information through aspect\-centered and hierarchical attentions\.

Moving beyond optimizations restricted to purely sequential text methods like localized windows, graph\-based modeling has emerged as a mainstream paradigm for structural representation\. To effectively integrate local mechanisms into graph structures,Wanget al\.\([2024](https://arxiv.org/html/2605.22228#bib.bib9)\)filters noisy dependency edges with Distance\-based Syntactic Weight and Aspect\-Fusion Attention\. To bridge these graphical structures with powerful global attention,Yin and Zhong \([2024](https://arxiv.org/html/2605.22228#bib.bib10)\)couples a GNN\-based graph view with a Transformer\-based sequence view\.

While traditional graphs effectively capture pairwise syntax, hypergraph\-based ABSA further explores high\-order relations beyond pairwise dependency edges\.Ouyanget al\.\([2024a](https://arxiv.org/html/2605.22228#bib.bib11)\)builds word\-level relational hypergraphs from syntactic and semantic relations and applies aspect\-specific hypergraph attention, whileKashyapet al\.\([2025](https://arxiv.org/html/2605.22228#bib.bib13)\)induces dynamic aspect\-opinion hyperedges through sample\-specific hierarchical clustering\. These works suggest that hyperedges are suitable for representing multi\-token sentiment evidence\.

Expanding the scope from ABSA\-specific architectures, graph representation learning has continued to evolve beyond localized message passing\. Graphormer shows that structural encodings can be injected into Transformer attention, allowing global graph\-aware interaction beyond local message passing\(Yinget al\.,[2021](https://arxiv.org/html/2605.22228#bib.bib14)\)\. Once hyperedges are lifted into nodes, token–token, token–hyperedge, and hyperedge–hyperedge relations can be modeled in the same attention space\. Recent adaptive hypergraph designs outside NLP, such as HyperACE in YOLOv13, also highlight the potential of dynamically induced high\-order correlations\(Leiet al\.,[2025](https://arxiv.org/html/2605.22228#bib.bib15)\)\.

To rigorously test these diverse modeling approaches, recent studies have introduced more challenging evaluation settings for sentiment reasoning\.Feiet al\.\([2023](https://arxiv.org/html/2605.22228#bib.bib36)\)study implicit sentiment through multi\-hop reasoning over implicit aspects, opinions, and polarities\.Yang and Li \([2024](https://arxiv.org/html/2605.22228#bib.bib38)\)evaluate robustness on ARTS\(Xinget al\.,[2020](https://arxiv.org/html/2605.22228#bib.bib40)\), where distracting sentiment words and aspect\-opinion mismatches expose reliance on global sentiment shortcuts\. These benchmarks test whether models can reliably bind evidence under implicit, noisy, or adversarial conditions, and are therefore useful for evaluating the reliability of incidence\-based structural reasoning\.

## 5Conclusion

In this paper, we proposedGHI, a Graphormer\-over\-conditioned\-Hypergraph\-Incidence framework for ABSA\.GHIconverts heterogeneous linguistic and semantic evidence into token–hyperedge incidence relations, avoiding source\-specific reasoning branches while retaining explicit structural control\. Through soft incidence propagation and hard incidence bipartite star\-expanded Graphormer attention, the framework reasons jointly over diverse evidence in a shared space\. Results across comprehensive datasets support the effectiveness of this incidence\-centered design\.

## Limitations

GHIcurrently focuses on aspect\-term sentiment classification, where the target aspect is given\. It does not directly address aspect extraction, opinion extraction, or end\-to\-end aspect–opinion pair discovery\. Extending the incidence formulation to full ABSA pipelines would require additional decoding mechanisms or supervision for inducing aspects, opinions, and their sentiment relations\.

Second, the effectiveness ofGHIcan still be influenced by the quality of structural priors and the bounded hyperedge budget\. Dependency\-based hyperedges may inherit noise from parser outputs or preprocessing artifacts, while the fixed number of adaptive hyperedges and the Top\-KKhard incidence view may miss weak but useful associations in highly complex sentences\.

Finally, the current star\-expanded Graphormer uses dense attention over the expanded token–hyperedge graph\. Although the bounded hyperedge budget keeps the overhead controlled for short ABSA texts, applying the same design to long reviews, documents, or dialogue\-level sentiment analysis may require sparse attention or hierarchical incidence construction\.

## References

- Counterfactual\-enhanced information bottleneck for aspect\-based sentiment analysis\.38\(16\),pp\. 17736–17744\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/29726),[Document](https://dx.doi.org/10.1609/aaai.v38i16.29726)Cited by:[Table 5](https://arxiv.org/html/2605.22228#S3.T5.1.1.4.3.1)\.
- B\. Chen, Q\. Ouyang, Y\. Luo, B\. Xu, R\. Cai, and Z\. Hao \(2024\)S2GSL: incorporating segment to syntactic enhanced graph structure learning for aspect\-based sentiment analysis\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 13366–13379\.External Links:[Link](https://aclanthology.org/2024.acl-long.721/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.721)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.1.1.1.1)\.
- C\. Chen, Z\. Teng, Z\. Wang, and Y\. Zhang \(2022\)Discrete opinion tree induction for aspect\-based sentiment analysis\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 2051–2064\.External Links:[Link](https://aclanthology.org/2022.acl-long.145/),[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.145)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.13.8.1)\.
- C\. Chen, Z\. Teng, and Y\. Zhang \(2020\)Inducing target\-specific latent structures for aspect sentiment classification\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),B\. Webber, T\. Cohn, Y\. He, and Y\. Liu \(Eds\.\),Online,pp\. 5596–5607\.External Links:[Link](https://aclanthology.org/2020.emnlp-main.451/),[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-main.451)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.7.2.1)\.
- J\. Dai, H\. Yan, T\. Sun, P\. Liu, and X\. Qiu \(2021\)Does syntax matter? a strong baseline for aspect\-based sentiment analysis with RoBERTa\.InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,K\. Toutanova, A\. Rumshisky, L\. Zettlemoyer, D\. Hakkani\-Tur, I\. Beltagy, S\. Bethard, R\. Cotterell, T\. Chakraborty, and Y\. Zhou \(Eds\.\),Online,pp\. 1816–1829\.External Links:[Link](https://aclanthology.org/2021.naacl-main.146/),[Document](https://dx.doi.org/10.18653/v1/2021.naacl-main.146)Cited by:[§3\.2](https://arxiv.org/html/2605.22228#S3.SS2.p1.1),[Table 1](https://arxiv.org/html/2605.22228#S3.T1)\.
- L\. Dong, F\. Wei, C\. Tan, D\. Tang, M\. Zhou, and K\. Xu \(2014\)Adaptive recursive neural network for target\-dependent Twitter sentiment classification\.InProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics \(Volume 2: Short Papers\),K\. Toutanova and H\. Wu \(Eds\.\),Baltimore, Maryland,pp\. 49–54\.External Links:[Link](https://aclanthology.org/P14-2009/),[Document](https://dx.doi.org/10.3115/v1/P14-2009)Cited by:[§A\.1](https://arxiv.org/html/2605.22228#A1.SS1.p1.1)\.
- H\. Fei, B\. Li, Q\. Liu, L\. Bing, F\. Li, and T\. Chua \(2023\)Reasoning implicit sentiment with chain\-of\-thought prompting\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 2: Short Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 1171–1182\.External Links:[Link](https://aclanthology.org/2023.acl-short.101/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-short.101)Cited by:[§3\.4](https://arxiv.org/html/2605.22228#S3.SS4.p1.1),[Table 4](https://arxiv.org/html/2605.22228#S3.T4.1.1.5.3.1),[Table 4](https://arxiv.org/html/2605.22228#S3.T4.1.1.6.4.1),[Table 4](https://arxiv.org/html/2605.22228#S3.T4.1.1.7.5.1),[§4](https://arxiv.org/html/2605.22228#S4.p6.1)\.
- A\. Feng, J\. Cai, Z\. Gao, and X\. Li \(2023\)Aspect\-level sentiment classification with fused local and global context\.Journal of Big Data10\(1\),pp\. 176\.External Links:[Document](https://dx.doi.org/10.1186/s40537-023-00856-8),[Link](https://doi.org/10.1186/s40537-023-00856-8)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.20.15.1)\.
- Y\. Feng, H\. You, Z\. Zhang, R\. Ji, and Y\. Gao \(2019\)Hypergraph neural networks\.33\(01\),pp\. 3558–3565\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/4235),[Document](https://dx.doi.org/10.1609/aaai.v33i01.33013558)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p4.1)\.
- P\. He, X\. Liu, J\. Gao, and W\. Chen \(2020\)DeBERTa: decoding\-enhanced BERT with disentangled attention\.abs/2006\.03654\.External Links:[Link](https://arxiv.org/abs/2006.03654),2006\.03654Cited by:[§3\.1](https://arxiv.org/html/2605.22228#S3.SS1.p1.1)\.
- J\. Ji, W\. Zhu, C\. Hou, Q\. Song, Y\. Ma, and Y\. Wang \(2026\)DAGF: a dual gcn and auxiliary graph fusion based model for aspect\-based sentiment analysis\.195,pp\. 115040\.External Links:ISSN 1568\-4946,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.asoc.2026.115040),[Link](https://www.sciencedirect.com/science/article/pii/S1568494626004886)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.21.16.1)\.
- Z\. Jian, J\. Li, Q\. Wu, and J\. Yao \(2024\)Retrieval contrastive learning for aspect\-level sentiment classification\.Information Processing & ManagementProceedings of the AAAI Conference on Artificial IntelligenceProceedings of the AAAI Conference on Artificial IntelligenceApplied SciencesProceedings of the AAAI Conference on Artificial IntelligenceApplied Soft ComputingCoRR61\(1\),pp\. 103539\.External Links:ISSN 0306\-4573,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.ipm.2023.103539),[Link](https://www.sciencedirect.com/science/article/pii/S0306457323002765)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.22.17.1)\.
- Z\. Jian, D\. Wu, S\. Wang, Y\. Wang, J\. Yao, M\. Wang, and Q\. Wu \(2025\)AGCL: aspect graph construction and learning for aspect\-level sentiment classification\.InProceedings of the 31st International Conference on Computational Linguistics,O\. Rambow, L\. Wanner, M\. Apidianaki, H\. Al\-Khalifa, B\. D\. Eugenio, and S\. Schockaert \(Eds\.\),Abu Dhabi, UAE,pp\. 841–854\.External Links:[Link](https://aclanthology.org/2025.coling-main.56/)Cited by:[§3\.1](https://arxiv.org/html/2605.22228#S3.SS1.p1.1),[Table 1](https://arxiv.org/html/2605.22228#S3.T1),[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.3.1)\.
- Q\. Jiang, L\. Chen, R\. Xu, X\. Ao, and M\. Yang \(2019\)A challenge dataset and effective models for aspect\-based sentiment analysis\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),K\. Inui, J\. Jiang, V\. Ng, and X\. Wan \(Eds\.\),Hong Kong, China,pp\. 6280–6285\.External Links:[Link](https://aclanthology.org/D19-1654/),[Document](https://dx.doi.org/10.18653/v1/D19-1654)Cited by:[§A\.1](https://arxiv.org/html/2605.22228#A1.SS1.p1.1)\.
- S\. Jin, Q\. He, Y\. Wang, N\. Du, and W\. Lei \(2025\)Aspect\-based sentiment analysis with semantic and syntactic enhanced multi\-layer fusion model\.Engineering Applications of Artificial Intelligence159,pp\. 111654\.External Links:ISSN 0952\-1976,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.engappai.2025.111654),[Link](https://www.sciencedirect.com/science/article/pii/S0952197625016562)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p2.1),[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.18.13.1),[§4](https://arxiv.org/html/2605.22228#S4.p2.1)\.
- X\. Ju, L\. Ding, R\. Yang, C\. Guo, G\. Zou, B\. Zhang, and M\. Li \(2025\)Dual contrastive learning\-based hypergraph convolutional network for aspect\-based sentiment classification\.Knowledge\-Based Systems330,pp\. 114701\.External Links:ISSN 0950\-7051,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.knosys.2025.114701),[Link](https://www.sciencedirect.com/science/article/pii/S095070512501740X)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p4.1)\.
- O\. M\. Kashyap, P\. Amit, M\. Kashyap, A\. M\. Joshi, and S\. SS \(2025\)From graphs to hypergraphs: enhancing aspect\-based sentiment analysis via multi\-level relational modeling\.External Links:2511\.14142,[Link](https://arxiv.org/abs/2511.14142)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p4.1),[§4](https://arxiv.org/html/2605.22228#S4.p4.1)\.
- M\. Lei, S\. Li, Y\. Wu, H\. Hu, Y\. Zhou, X\. Zheng, G\. Ding, S\. Du, Z\. Wu, and Y\. Gao \(2025\)YOLOv13: real\-time object detection with hypergraph\-enhanced adaptive visual perception\.External Links:2506\.17733,[Link](https://arxiv.org/abs/2506.17733)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p5.1),[§4](https://arxiv.org/html/2605.22228#S4.p5.1)\.
- R\. Li, H\. Chen, F\. Feng, Z\. Ma, X\. Wang, and E\. Hovy \(2021a\)Dual graph convolutional networks for aspect\-based sentiment analysis\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing \(Volume 1: Long Papers\),C\. Zong, F\. Xia, W\. Li, and R\. Navigli \(Eds\.\),Online,pp\. 6319–6329\.External Links:[Link](https://aclanthology.org/2021.acl-long.494/),[Document](https://dx.doi.org/10.18653/v1/2021.acl-long.494)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.10.5.1)\.
- Z\. Li, Y\. Zou, C\. Zhang, Q\. Zhang, and Z\. Wei \(2021b\)Learning implicit sentiment in aspect\-based sentiment analysis with supervised contrastive pre\-training\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Online and Punta Cana, Dominican Republic,pp\. 246–256\.External Links:[Link](https://aclanthology.org/2021.emnlp-main.22/),[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.22)Cited by:[§3\.4](https://arxiv.org/html/2605.22228#S3.SS4.p1.1),[Table 4](https://arxiv.org/html/2605.22228#S3.T4.1.1.3.1.1)\.
- F\. Ma, X\. Hu, A\. Liu, Y\. Yang, S\. Li, P\. S\. Yu, and L\. Wen \(2023\)AMR\-based network for aspect\-based sentiment analysis\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 322–337\.External Links:[Link](https://aclanthology.org/2023.acl-long.19/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.19)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.15.10.1)\.
- R\. Mukherjee, S\. Shetty, S\. Chattopadhyay, S\. Maji, S\. Datta, and P\. Goyal \(2021\)Reproducibility, replicability and beyond: assessing production readiness of aspect based sentiment analysis in the wild\.InAdvances in Information Retrieval,D\. Hiemstra, M\. Moens, J\. Mothe, R\. Perego, M\. Potthast, and F\. Sebastiani \(Eds\.\),Cham,pp\. 92–106\.External Links:ISBN 978\-3\-030\-72240\-1,[Document](https://dx.doi.org/10.1007/978-3-030-72240-1%5F7),[Link](https://doi.org/10.1007/978-3-030-72240-1_7)Cited by:[§3\.2](https://arxiv.org/html/2605.22228#S3.SS2.p1.1)\.
- J\. Ouyang, C\. Xuan, B\. Wang, and Z\. Yang \(2024a\)Aspect\-based sentiment classification with aspect\-specific hypergraph attention networks\.Expert Systems with Applications248,pp\. 123412\.External Links:ISSN 0957\-4174,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.eswa.2024.123412),[Link](https://www.sciencedirect.com/science/article/pii/S095741742400277X)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p4.1),[§4](https://arxiv.org/html/2605.22228#S4.p4.1)\.
- J\. Ouyang, Z\. Yang, S\. Liang, B\. Wang, Y\. Wang, and X\. Li \(2024b\)Aspect\-based sentiment analysis with explicit sentiment augmentations\.38\(17\),pp\. 18842–18850\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/29849),[Document](https://dx.doi.org/10.1609/aaai.v38i17.29849)Cited by:[Table 4](https://arxiv.org/html/2605.22228#S3.T4.1.1.4.2.1)\.
- M\. Pontiki, D\. Galanis, H\. Papageorgiou, I\. Androutsopoulos, S\. Manandhar, M\. AL\-Smadi, M\. Al\-Ayyoub, Y\. Zhao, B\. Qin, O\. De Clercq, V\. Hoste, M\. Apidianaki, X\. Tannier, N\. Loukachevitch, E\. Kotelnikov, N\. Bel, S\. M\. Jiménez\-Zafra, and G\. Eryiğit \(2016\)SemEval\-2016 task 5: aspect based sentiment analysis\.InProceedings of the 10th International Workshop on Semantic Evaluation \(SemEval\-2016\),S\. Bethard, M\. Carpuat, D\. Cer, D\. Jurgens, P\. Nakov, and T\. Zesch \(Eds\.\),San Diego, California,pp\. 19–30\.External Links:[Link](https://aclanthology.org/S16-1002/),[Document](https://dx.doi.org/10.18653/v1/S16-1002)Cited by:[§A\.1](https://arxiv.org/html/2605.22228#A1.SS1.p1.1)\.
- M\. Pontiki, D\. Galanis, H\. Papageorgiou, S\. Manandhar, and I\. Androutsopoulos \(2015\)SemEval\-2015 task 12: aspect based sentiment analysis\.InProceedings of the 9th International Workshop on Semantic Evaluation \(SemEval 2015\),P\. Nakov, T\. Zesch, D\. Cer, and D\. Jurgens \(Eds\.\),Denver, Colorado,pp\. 486–495\.External Links:[Link](https://aclanthology.org/S15-2082/),[Document](https://dx.doi.org/10.18653/v1/S15-2082)Cited by:[§A\.1](https://arxiv.org/html/2605.22228#A1.SS1.p1.1)\.
- M\. Pontiki, D\. Galanis, J\. Pavlopoulos, H\. Papageorgiou, I\. Androutsopoulos, and S\. Manandhar \(2014\)SemEval\-2014 task 4: aspect based sentiment analysis\.InProceedings of the 8th International Workshop on Semantic Evaluation \(SemEval 2014\),P\. Nakov and T\. Zesch \(Eds\.\),Dublin, Ireland,pp\. 27–35\.External Links:[Link](https://aclanthology.org/S14-2004/),[Document](https://dx.doi.org/10.3115/v1/S14-2004)Cited by:[§A\.1](https://arxiv.org/html/2605.22228#A1.SS1.p1.1)\.
- H\. Tang, D\. Ji, C\. Li, and Q\. Zhou \(2020\)Dependency graph enhanced dual\-transformer structure for aspect\-based sentiment classification\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 6578–6588\.External Links:[Link](https://aclanthology.org/2020.acl-main.588/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.588)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.9.4.1)\.
- Y\. Tian, G\. Chen, and Y\. Song \(2021\)Aspect\-based sentiment analysis with type\-aware graph convolutional networks and layer ensemble\.InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,K\. Toutanova, A\. Rumshisky, L\. Zettlemoyer, D\. Hakkani\-Tur, I\. Beltagy, S\. Bethard, R\. Cotterell, T\. Chakraborty, and Y\. Zhou \(Eds\.\),Online,pp\. 2910–2922\.External Links:[Link](https://aclanthology.org/2021.naacl-main.231/),[Document](https://dx.doi.org/10.18653/v1/2021.naacl-main.231)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.11.6.1)\.
- B\. Wang, T\. Shen, G\. Long, T\. Zhou, and Y\. Chang \(2021\)Eliminating sentiment bias for aspect\-level sentiment classification with unsupervised opinion extraction\.InFindings of the Association for Computational Linguistics: EMNLP 2021,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Punta Cana, Dominican Republic,pp\. 3002–3012\.External Links:[Link](https://aclanthology.org/2021.findings-emnlp.258/),[Document](https://dx.doi.org/10.18653/v1/2021.findings-emnlp.258)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.19.14.1)\.
- K\. Wang, W\. Shen, Y\. Yang, X\. Quan, and R\. Wang \(2020\)Relational graph attention network for aspect\-based sentiment analysis\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 3229–3238\.External Links:[Link](https://aclanthology.org/2020.acl-main.295/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.295)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.8.3.1)\.
- Z\. Wang, B\. Zhang, R\. Yang, C\. Guo, and M\. Li \(2024\)DAGCN: distance\-based and aspect\-oriented graph convolutional network for aspect\-based sentiment analysis\.InFindings of the Association for Computational Linguistics: NAACL 2024,K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 1863–1876\.External Links:[Link](https://aclanthology.org/2024.findings-naacl.120/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-naacl.120)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.16.11.1),[§4](https://arxiv.org/html/2605.22228#S4.p3.1)\.
- X\. Xing, Z\. Jin, D\. Jin, B\. Wang, Q\. Zhang, and X\. Huang \(2020\)Tasty burgers, soggy fries: probing aspect robustness in aspect\-based sentiment analysis\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),B\. Webber, T\. Cohn, Y\. He, and Y\. Liu \(Eds\.\),Online,pp\. 3594–3605\.External Links:[Link](https://aclanthology.org/2020.emnlp-main.292/),[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-main.292)Cited by:[§3\.5](https://arxiv.org/html/2605.22228#S3.SS5.p1.1),[§4](https://arxiv.org/html/2605.22228#S4.p6.1)\.
- H\. Yang and K\. Li \(2024\)Modeling aspect sentiment coherency via local sentiment aggregation\.InFindings of the Association for Computational Linguistics: EACL 2024,Y\. Graham and M\. Purver \(Eds\.\),St\. Julian’s, Malta,pp\. 182–195\.External Links:[Link](https://aclanthology.org/2024.findings-eacl.13/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-eacl.13)Cited by:[§3\.5](https://arxiv.org/html/2605.22228#S3.SS5.p1.1),[Table 5](https://arxiv.org/html/2605.22228#S3.T5),[Table 5](https://arxiv.org/html/2605.22228#S3.T5.1.1.5.4.1),[Table 5](https://arxiv.org/html/2605.22228#S3.T5.1.1.6.5.1),[Table 5](https://arxiv.org/html/2605.22228#S3.T5.1.1.7.6.1),[§4](https://arxiv.org/html/2605.22228#S4.p6.1)\.
- H\. Yang, C\. Zhang, and K\. Li \(2023\)PyABSA: a modularized framework for reproducible aspect\-based sentiment analysis\.InProceedings of the 32nd ACM International Conference on Information and Knowledge Management,CIKM ’23,New York, NY, USA,pp\. 5117–5122\.External Links:ISBN 9798400701245,[Link](https://doi.org/10.1145/3583780.3614752),[Document](https://dx.doi.org/10.1145/3583780.3614752)Cited by:[§3\.2](https://arxiv.org/html/2605.22228#S3.SS2.p1.1)\.
- S\. Yin and G\. Zhong \(2024\)TextGT: a double\-view graph transformer on text for aspect\-based sentiment analysis\.Proceedings of the AAAI Conference on Artificial Intelligence38\(17\),pp\. 19404–19412\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/29911),[Document](https://dx.doi.org/10.1609/aaai.v38i17.29911)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p2.1),[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.17.12.1),[§4](https://arxiv.org/html/2605.22228#S4.p3.1)\.
- C\. Ying, T\. Cai, S\. Luo, S\. Zheng, G\. Ke, D\. He, Y\. Shen, and T\. Liu \(2021\)Do transformers really perform badly for graph representation?\.InAdvances in Neural Information Processing Systems,M\. Ranzato, A\. Beygelzimer, Y\. Dauphin, P\.S\. Liang, and J\. W\. Vaughan \(Eds\.\),Vol\.34,pp\. 28877–28888\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2021/file/f1c1592588411002af340cbaedd6fc33-Paper.pdf)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p5.1),[§2\.4](https://arxiv.org/html/2605.22228#S2.SS4.SSS0.Px3.p2.1),[§4](https://arxiv.org/html/2605.22228#S4.p5.1)\.
- B\. Yu and S\. Zhang \(2023\)A novel weight\-oriented graph convolutional network for aspect\-based sentiment analysis\.The Journal of Supercomputing79\(1\),pp\. 947–972\.External Links:[Document](https://dx.doi.org/10.1007/s11227-022-04689-9),[Link](https://doi.org/10.1007/s11227-022-04689-9)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.14.9.1)\.
- B\. Zeng, H\. Yang, R\. Xu, W\. Zhou, and X\. Han \(2019\)LCF: a local context focus mechanism for aspect\-based sentiment classification\.9\(16\)\.External Links:[Link](https://www.mdpi.com/2076-3417/9/16/3389),ISSN 2076\-3417,[Document](https://dx.doi.org/10.3390/app9163389)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.22228#S2.SS3.SSS0.Px1.p1.12),[§4](https://arxiv.org/html/2605.22228#S4.p2.1)\.
- C\. Zhang, Q\. Li, and D\. Song \(2019\)Syntax\-aware aspect\-level sentiment classification with proximity\-weighted convolution network\.InProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval,SIGIR’19,New York, NY, USA,pp\. 1145–1148\.External Links:ISBN 9781450361729,[Link](https://doi.org/10.1145/3331184.3331351),[Document](https://dx.doi.org/10.1145/3331184.3331351)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.2.2.2.1)\.
- W\. Zhang, X\. Li, Y\. Deng, L\. Bing, and W\. Lam \(2023\)A survey on aspect\-based sentiment analysis: tasks, methods, and challenges\.IEEE Transactions on Knowledge and Data Engineering35\(11\),pp\. 11019–11038\.External Links:[Document](https://dx.doi.org/10.1109/TKDE.2022.3230975)Cited by:[§1](https://arxiv.org/html/2605.22228#S1.p1.1)\.
- Z\. Zhang, Z\. Zhou, and Y\. Wang \(2022\)SSEGCN: syntactic and semantic enhanced graph convolutional network for aspect\-based sentiment analysis\.InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,M\. Carpuat, M\. de Marneffe, and I\. V\. Meza Ruiz \(Eds\.\),Seattle, United States,pp\. 4916–4925\.External Links:[Link](https://aclanthology.org/2022.naacl-main.362/),[Document](https://dx.doi.org/10.18653/v1/2022.naacl-main.362)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.12.7.1)\.
- J\. Zhou, J\. X\. Huang, Q\. V\. Hu, and L\. He \(2020\)SK\-gcn: modeling syntax and knowledge via graph convolutional network for aspect\-level sentiment classification\.Knowledge\-Based Systems205,pp\. 106292\.External Links:ISSN 0950\-7051,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.knosys.2020.106292),[Link](https://www.sciencedirect.com/science/article/pii/S0950705120304688)Cited by:[Table 1](https://arxiv.org/html/2605.22228#S3.T1.3.3.6.1.1)\.

## Appendix AExperimental Details

### A\.1Datasets

We conduct experiments on six standard benchmark datasets on ABSA\. The statistics of the six datasets are shown in Table[6](https://arxiv.org/html/2605.22228#A1.T6)\. Restaurant14 and Laptop datasets are from SemEval\-2014 Task 4\(Pontikiet al\.,[2014](https://arxiv.org/html/2605.22228#bib.bib46)\)\. Twitter dataset is the target\-dependent Twitter sentiment corpus introduced byDonget al\.\([2014](https://arxiv.org/html/2605.22228#bib.bib47)\)\. Restaurant15 and Restaurant16 datasets are from the restaurant\-domain subsets of SemEval\-2015 Task 12 and SemEval\-2016 Task 5, respectively\(Pontikiet al\.,[2015](https://arxiv.org/html/2605.22228#bib.bib48),[2016](https://arxiv.org/html/2605.22228#bib.bib49)\)\. MAMS is the Multi\-Aspect Multi\-Sentiment dataset introduced byJianget al\.\([2019](https://arxiv.org/html/2605.22228#bib.bib50)\), where each sentence contains multiple aspects with different sentiment polarities\. All datasets are used for research purposes, following their original licenses and distribution terms\.

For ISE, we follow prior implicit\-sentiment evaluation protocols and process all instances using the same dependency parsing and token\-alignment pipelines as in the main experiments\. For ARTS, we follow the adversarial aspect robustness evaluation protocol\. Models are trained only on the original training split, without ARTS augmentation\. The models are evaluated directly on the perturbed ARTS test samples\.

Table 6:Statistics of the six standard benchmark datasets\.
### A\.2Implementation Details

We use DeBERTa\-v3\-base as the pre\-trained encoder and feed the sentence–aspect pair as encoder input\. Word\-level dependency parses are aligned to subword\-level encoder states before graph construction\. The model is optimized with AdamW using a batch size of 16 and gradient clipping of 1\.0\. The learning rate is selected from\{1×10−5,2×10−5,2\.5×10−5\}\\\{1\\times 10^\{\-5\},2\\times 10^\{\-5\},2\.5\\times 10^\{\-5\}\\\}, and the weight decay is selected from\{1×10−4,5×10−3\}\\\{1\\times 10^\{\-4\},5\\times 10^\{\-3\}\\\}\. We train for 30 epochs on six standard benchmark datasets\.

ForGHI, we use 2 reasoning layers\. The static hyperedges include aspect, SRD\-based local\-context, and dependency hyperedges, with dependency hop thresholdT=2\\mathrm\{T\}=2\. The local window size is selected from\{3,4\}\\\{3,4\\\}, the SRD radius from\{3,5\}\\\{3,5\\\}, and the adaptive Top\-KKfrom\{3,4\}\\\{3,4\\\}\. The number of adaptive hyperedges is set toM=6\\mathrm\{M\}=6for Twitter andM=4\\mathrm\{M\}=4for the rest\. All experiments are conducted on a single NVIDIA RTX 5080 GPU\.

### A\.3Additional Multi\-seed Results on Twitter

We additionally report multi\-seed results on Twitter, as shown in Table[7](https://arxiv.org/html/2605.22228#A2.T7)\. On the noisier Twitter domain, GHI obtains modest but consistent gains over the vanilla DeBERTa\-base encoder, suggesting that the incidence reasoning layer is not limited to the cleaner SemEval\-2014 domains\.

## Appendix BFramework Details

Table 7:Additional controlled comparison against vanilla DeBERTa\-base on Twitter\. Results are reported as mean\(std\) over 5 seeds\.### B\.1Adaptive Hyperedges Generation

![Refer to caption](https://arxiv.org/html/2605.22228v1/x6.png)Figure 6:Adaptive Incidence GenerationLetP0∈ℝM×dhP^\{0\}\\in\\mathbb\{R\}^\{M\\times d\_\{h\}\}denoteMMlearnable base adaptive hyperedge prototypes, wheredhd\_\{h\}denotes the hidden dimension\. They are randomly initialized as global parameters and shared by all samples and reasoning layers, serving as the latent bases from which adaptive hyperedges are induced\.

At each reasoning layer,GHIconditions these shared prototypes on the current sentence and aspect anchors to induce sample\-specific adaptive hyperedges\. We first summarize the current instance by:

gℓ=\[cℓ;aℓ;Poolm\(H~ℓ\)\],g^\{\\ell\}=\[c^\{\\ell\};a^\{\\ell\};\\mathrm\{Pool\}\_\{m\}\(\\widetilde\{H\}^\{\\ell\}\)\],\(9\)
wherePoolm\(⋅\)\\mathrm\{Pool\}\_\{m\}\(\\cdot\)mean\-pools valid sentence tokens\. The context vectorgℓg^\{\\ell\}is mapped to a prototype offsetΔPℓ∈ℝM×d\\Delta P^\{\\ell\}\\in\\mathbb\{R\}^\{M\\times d\}, which is then used to generate the context\-conditioned adaptive prototypesPjℓP\_\{j\}^\{\\ell\}:

ΔPℓ=reshape\(MLP\(gℓ\)\),\\Delta P^\{\\ell\}=\\mathrm\{reshape\}\(\\mathrm\{MLP\}\(g^\{\\ell\}\)\),\(10\)Pjℓ=LN\(Pj0\+ΔPjℓ\+aℓ\),j=1,…,M,P\_\{j\}^\{\\ell\}=\\mathrm\{LN\}\(P\_\{j\}^\{0\}\+\\Delta P\_\{j\}^\{\\ell\}\+a^\{\\ell\}\),\\quad j=1,\\ldots,M,\(11\)
where the aspect anchor is added to each prototype as a target\-specific bias\. Thus, sentence anchorcℓc^\{\\ell\}, aspect anchoraℓa^\{\\ell\}, and the pooled token contextH~ℓ\\widetilde\{H\}^\{\\ell\}adapt the shared basesP0P^\{0\}to the current instance\.

For each conditioned adaptive hyperedge prototype, GHI computes token–prototype participation scores for tokeni∈Vi\\in Vand prototypej=1,…,Mj=1,\\ldots,M:

sijℓ=⟨WtH~iℓ,WpPjℓ⟩dk\+⟨WaH~iℓ,aℓ⟩dk,s\_\{ij\}^\{\\ell\}=\\frac\{\\langle W\_\{t\}\\widetilde\{H\}\_\{i\}^\{\\ell\},W\_\{p\}P\_\{j\}^\{\\ell\}\\rangle\}\{\\sqrt\{d\_\{k\}\}\}\+\\frac\{\\langle W\_\{a\}\\widetilde\{H\}\_\{i\}^\{\\ell\},a^\{\\ell\}\\rangle\}\{\\sqrt\{d\_\{k\}\}\},\(12\)
whereWtW\_\{t\},WpW\_\{p\}, andWaW\_\{a\}are learned projections from tokens, prototypes and aspects, respectively\. Thedkd\_\{k\}denotes the projection dimension used for scaling\.

The soft adaptive incidence is obtained by applying a masked softmax over valid graph\-visible tokens for each prototype:

Iad,ijℓ=MaskedSoftmaxi\(sijℓ;m\),I^\{\\ell\}\_\{\\mathrm\{ad\},ij\}=\\mathrm\{MaskedSoftmax\}\_\{i\}\(s^\{\\ell\}\_\{ij\};m\),\(13\)
whereIad,ijℓI^\{\\ell\}\_\{ad,ij\}is the normalized incidence weight between tokeniiand the adaptive hyperedge induced by prototypejj, and the maskmmexcludes invalid tokens from the normalization\. Together with the prototype conditioning in Eqs\.[9](https://arxiv.org/html/2605.22228#A2.E9)–[12](https://arxiv.org/html/2605.22228#A2.E12), this masked\-softmax normalization instantiatesAdaptiveIncidence\(⋅\)\\mathrm\{AdaptiveIncidence\(\\cdot\)\}in Eq\.[2](https://arxiv.org/html/2605.22228#S2.E2)\.

Finally, each conditioned prototype generates one adaptive hyperedge, and theMMgenerated hyperedges formIadℓ∈ℝ\|V\|×MI^\{\\ell\}\_\{\\mathrm\{ad\}\}\\in\\mathbb\{R\}^\{\|V\|\\times M\}\. The overall generation process is illustrated in Figure[6](https://arxiv.org/html/2605.22228#A2.F6)\.

### B\.2Layer\-Wise Computation Flow

![Refer to caption](https://arxiv.org/html/2605.22228v1/x7.png)Figure 7:Layer\-wise propagation in one GHI reasoning layer\.As shown in Figure[7](https://arxiv.org/html/2605.22228#A2.F7), starting from token statesHℓH^\{\\ell\}, local\-window attention producesH~ℓ\\widetilde\{H\}^\{\\ell\}, edge pooling produces hyperedge statesZℓZ^\{\\ell\}and soft/hard incidence matrices\. GHI then performs local incidence reasoning throughHGRefine\\mathrm\{HGRefine\}andIncAttn\\mathrm\{IncAttn\}in parallel with star\-expanded Graphormer reasoning, followed by anchor\-conditioned fusion and a feed\-forward update to obtainHℓ\+1H^\{\\ell\+1\}\. Anchor memories are updated fromHℓ\+1H^\{\\ell\+1\}for the next layer\. The details of theHGRefine\\mathrm\{HGRefine\},IncAttn\\mathrm\{IncAttn\}andAnchorUpdate\\mathrm\{AnchorUpdate\}are as follows\.

#### HGRefine

Given the soft incidence matrixIsoftℓI^\{\\ell\}\_\{soft\},HGRefine\\mathrm\{HGRefine\}performs incidence\-weighted token–hyperedge propagation\. We first column\-normalize incidence weights over tokens for each hyperedge:

αijℓ=Isoft,ijℓ∑i′Isoft,i′jℓ,\\alpha^\{\\ell\}\_\{ij\}=\\frac\{I^\{\\ell\}\_\{\\mathrm\{soft\},ij\}\}\{\\sum\_\{i^\{\\prime\}\}I^\{\\ell\}\_\{\\mathrm\{soft\},i^\{\\prime\}j\}\},\(14\)
and update hyperedge states by aggregating token states:

Z^jℓ=∑iαijℓWtH~iℓ\.\\hat\{Z\}^\{\\ell\}\_\{j\}=\\sum\_\{i\}\\alpha^\{\\ell\}\_\{ij\}W\_\{t\}\\widetilde\{H\}^\{\\ell\}\_\{i\}\.\(15\)
The updated hyperedge states are then written back to tokens using row\-normalized incidence weightsβijℓ\\beta^\{\\ell\}\_\{ij\}over hyperedges:

βijℓ=Isoft,ijℓ∑j′Isoft,ij′ℓ,\\beta^\{\\ell\}\_\{ij\}=\\frac\{I^\{\\ell\}\_\{\\mathrm\{soft\},ij\}\}\{\\sum\_\{j^\{\\prime\}\}I^\{\\ell\}\_\{\\mathrm\{soft\},ij^\{\\prime\}\}\},\(16\)viℓ=Wo∑jβijℓZ^jℓ\.v^\{\\ell\}\_\{i\}=W\_\{o\}\\sum\_\{j\}\\beta^\{\\ell\}\_\{ij\}\\hat\{Z\}^\{\\ell\}\_\{j\}\.\(17\)
The vectorviℓv^\{\\ell\}\_\{i\}is theHGRefine\\mathrm\{HGRefine\}message for tokeniiin Eq\.[4](https://arxiv.org/html/2605.22228#S2.E4)\.

#### IncAttn

IncAttn\\mathrm\{IncAttn\}further performs token–hyperedge attention with incidence\-level relation features\. For tokeniiand hyperedgejj, we construct relation featureΦijℓ\\Phi^\{\\ell\}\_\{ij\}from edge type, edge role \(static and adaptive\), incidence weight, token SRD, and hyperedge SRD\. For attention headrr,IncAttn\\mathrm\{IncAttn\}computes an attention logiteijℓ,re^\{\\ell,r\}\_\{ij\}that measures how strongly tokeniiattends to hyperedgejj:

eijℓ,r=⟨WQrH~iℓ,WKrZjℓ⟩dk\+br\(Φijℓ\)\+log⁡\(Isoft,ijℓ\+ϵ\),e^\{\\ell,r\}\_\{ij\}=\\frac\{\\langle W\_\{Q\}^\{r\}\\widetilde\{H\}^\{\\ell\}\_\{i\},W\_\{K\}^\{r\}Z^\{\\ell\}\_\{j\}\\rangle\}\{\\sqrt\{d\_\{k\}\}\}\+b\_\{r\}\(\\Phi^\{\\ell\}\_\{ij\}\)\+\\log\(I^\{\\ell\}\_\{\\mathrm\{soft\},ij\}\+\\epsilon\),\(18\)
whereWQrW^\{r\}\_\{Q\}andWKrW^\{r\}\_\{K\}are the query and key projections for attention headrr,dkd\_\{k\}is the projected key dimension,br\(Φijℓ\)b\_\{r\}\(\\Phi^\{\\ell\}\_\{ij\}\)is the relation\-aware attention bias, andϵ\\epsilonis a small constant for numerical stability\. The logarithmic term injects the soft incidence weight as a prior attention bias\.

The attention weights are then obtained by masked softmax over valid hyperedges:

aijℓ,r=MaskedSoftmaxj\(eijℓ,r\)\.a^\{\\ell,r\}\_\{ij\}=\\mathrm\{MaskedSoftmax\}\_\{j\}\(e^\{\\ell,r\}\_\{ij\}\)\.\(19\)
To modulate the value passed from each hyperedge,IncAttn\\mathrm\{IncAttn\}further computes a scalar relation\-aware gate:

γijℓ=σ\(WγΦijℓ\+bγ\),\\gamma^\{\\ell\}\_\{ij\}=\\sigma\(W\_\{\\gamma\}\\Phi^\{\\ell\}\_\{ij\}\+b\_\{\\gamma\}\),\(20\)
whereWγW\_\{\\gamma\}andbγb\_\{\\gamma\}are learned parameters that map the relation featureΦijℓ\\Phi^\{\\ell\}\_\{ij\}to a scalar value gate\. This gate controls how much the hyperedgejjcontributes to theIncAttn\\mathrm\{IncAttn\}message of tokenii\.

TheIncAttn\\mathrm\{IncAttn\}message for tokeniiis then computed by aggregating gated hyperedge values across attention headsrr:

qiℓ=WoConcatr∑jaijℓ,rγijℓWVrZjℓ\.q^\{\\ell\}\_\{i\}=W\_\{o\}\\operatorname\*\{Concat\}\_\{r\}\\sum\_\{j\}a^\{\\ell,r\}\_\{ij\}\\gamma^\{\\ell\}\_\{ij\}W\_\{V\}^\{r\}Z^\{\\ell\}\_\{j\}\.\(21\)
Here,WVrW^\{r\}\_\{V\}is the value projection for headrr,WoW\_\{o\}is the output projection, andConcatr\\operatorname\*\{Concat\}\_\{r\}concatenates the outputs of all headsrr\. The resultingqiℓq^\{\\ell\}\_\{i\}is theIncAttn\\mathrm\{IncAttn\}message added to the token update in Eq\.[4](https://arxiv.org/html/2605.22228#S2.E4)\.

#### AnchorUpdate

After obtainingHℓ\+1H^\{\\ell\+1\},GHIupdates the sentence and aspect anchors by gated residual MLPs\. LetH¯ℓ\+1=Poolm\(Hℓ\+1\)\\overline\{H\}^\{\\ell\+1\}=\\mathrm\{Pool\}\_\{m\}\(H^\{\\ell\+1\}\)anda¯ℓ\+1=Pool\[l,r\)\(Hℓ\+1\)\\overline\{a\}^\{\\ell\+1\}=\\mathrm\{Pool\}\_\{\[l,r\)\}\(H^\{\\ell\+1\}\)\. We form:

ucℓ=\[cℓ;aℓ;h¯ℓ\+1\],u\_\{c\}^\{\\ell\}=\[c^\{\\ell\};a^\{\\ell\};\\bar\{h\}^\{\\ell\+1\}\],\(22\)uaℓ=\[aℓ;cℓ;a¯ℓ\+1\],u\_\{a\}^\{\\ell\}=\[a^\{\\ell\};c^\{\\ell\};\\bar\{a\}^\{\\ell\+1\}\],\(23\)
and update the anchor as:

cℓ\+1=cℓ\+σ\(Wcucℓ\)⊙MLPc\(ucℓ\),c^\{\\ell\+1\}=c^\{\\ell\}\+\\sigma\(W\_\{c\}u\_\{c\}^\{\\ell\}\)\\odot\\mathrm\{MLP\}\_\{c\}\(u\_\{c\}^\{\\ell\}\),\(24\)aℓ\+1=aℓ\+σ\(Wauaℓ\)⊙MLPa\(uaℓ\)\.a^\{\\ell\+1\}=a^\{\\ell\}\+\\sigma\(W\_\{a\}u\_\{a\}^\{\\ell\}\)\\odot\\mathrm\{MLP\}\_\{a\}\(u\_\{a\}^\{\\ell\}\)\.\(25\)
These updated anchors are used for next\-layer incidence induction and local–global fusion\.

## Appendix CSensitivity Analyses

We further examine the sensitivity ofGHIto three structural hyperparameters: the adaptive Top\-KKused for constructing the sparse hard incidence topology, the number of adaptive hyperedgesMM, and the number ofGHIreasoning layersLL\. We conduct the analysis on three representative datasets, Restaurant14, Laptop, and Twitter\. For each study, we vary one hyperparameter from 1 to 6 while keeping the remaining settings fixed to the main configuration of the corresponding dataset\.

![Refer to caption](https://arxiv.org/html/2605.22228v1/x8.png)Figure 8:Sensitivity analyses for number of Adaptive Top\-KK, number of adaptive hyperedgesMM, and number ofGHIlayersLL, respectively\.Figure[8](https://arxiv.org/html/2605.22228#A3.F8)shows the results of the sensitivity experiments\. For adaptive Top\-KK, moderate values generally perform better, suggesting that the hard topology benefits from retaining several high\-order token representations\. For the number of adaptive hyperedgesMM, performance usually saturates with a small number, indicating that a compact set of sample\-specific evidence slots is sufficient for ABSA\. Meanwhile, increasingMMfurther does not consistently improve performance and may introduce redundant or noisy memberships\. For the number of reasoning layersLL,GHIalso performs well with shallow stacks, withL=2L=2or nearby values often yielding strong results\. This supports our design choice of using compact incidence reasoning instead of relying on deep graph stacking\.
GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect-Based Sentiment Analysis

Similar Articles

Hypergraph as Language

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning

GraphReAct: Reasoning and Acting for Multi-step Graph Inference

HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models

Submit Feedback

Similar Articles

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation
HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
GraphReAct: Reasoning and Acting for Multi-step Graph Inference
HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models