HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue

arXiv cs.CL 06/12/26, 04:00 AM Papers
Summary
HyPE introduces a hypergraph-based persona encoder that models high-order relations among persona attributes via category-aware hyperedges and persistent edge embeddings, achieving consistent improvements over flat pooling baselines on PersonaChat across multiple backbone models.
arXiv:2606.13142v1 Announce Type: new Abstract: Persona-grounded dialogue systems aim to produce responses consistent with a speaker's persona, yet existing methods treat personas as a flat set of sentences and fail to model the high-order relations among persona attributes-e.g., that several persona sentences share a topical category. We propose HyPE (Hypergraph Persona Encoder), a framework that (i) analyzes each persona-bearing text as a (Core, Expression, Sentiment, Category) quadruple, and (ii) organizes persona elements into a hypergraph whose hyperedges are induced by shared category labels. An HyperGCN hypergraph neural network propagates this structure into a persona summary vector and a soft-memory bank that condition the response generator. We further propose Persistent Edge Embeddings (PEE), lightweight per-category learnable priors fused into the HyperGCN message-passing step. On PersonaChat under greedy decoding, HyPE consistently outperforms sentence-level pooling baselines across GPT-2, LLaMA-3.2-3B, and Qwen2.5-3B backbones by demonstrating that structured hyperedge-level persona encoding provides a transferable advantage across model scales.
Original Article
View Cached Full Text
Cached at: 06/12/26, 08:51 AM
# HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue
Source: [https://arxiv.org/html/2606.13142](https://arxiv.org/html/2606.13142)
Sangwon Youn Yoonjin Jang Youngjoong Ko Sungkyunkwan University, Suwon, Republic of Korea \{mikeyoun2000, yoonjinjang98\}@gmail\.com,yjko@skku\.edu

###### Abstract

Persona\-grounded dialogue systems aim to produce responses consistent with a speaker’s persona, yet existing methods treat personas as a flat set of sentences and fail to model the high\-order relations among persona attributes\-e\.g\., that several persona sentences share a topical category\. We proposeHyPE\(HypergraphPersonaEncoder\), a framework that \(i\) analyzes each persona\-bearing text as a*\(Core, Expression, Sentiment, Category\)*quadruple, and \(ii\) organizes persona elements into a hypergraph whose hyperedges are induced by shared category labels\. An HyperGCN hypergraph neural network propagates this structure into a persona summary vector and a soft\-memory bank that condition the response generator\. We further proposePersistent Edge Embeddings \(PEE\), lightweight per\-category learnable priors fused into the HyperGCN message\-passing step\. On PersonaChat under greedy decoding, HyPE consistently outperforms sentence\-level pooling baselines across GPT\-2, LLaMA\-3\.2\-3B, and Qwen2\.5\-3B backbones by demonstrating that structured hyperedge\-level persona encoding provides a transferable advantage across model scales\.111The source code for this project are anonymously available at[https://anonymous\.4open\.science/status/hyper\-graph\-F516](https://anonymous.4open.science/status/hyper-graph-F516)

HyPE: Category\-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona\-Grounded Dialogue

Sangwon Youn Yoonjin Jang Youngjoong Ko††thanks:Corresponding author\.Sungkyunkwan University, Suwon, Republic of Korea\{mikeyoun2000, yoonjinjang98\}@gmail\.com,yjko@skku\.edu

## 1Introduction

Persona\-grounded dialogue systems aim to produce responses consistent with a speaker’s profile, but the dominant paradigm, which conditions the language model on a flat sequence of persona sentencesZhanget al\.\([2018](https://arxiv.org/html/2606.13142#bib.bib1)\), ignores the high\-order semantic structure among persona attributes\. Two persona sentences such as “I love hiking in the mountains” and “I feel most alive outdoors” share a lifestyle category, yet no existing method explicitly represents this co\-membership relation\. Pairwise graph approaches can capture bilateral similarityTanget al\.\([2023a](https://arxiv.org/html/2606.13142#bib.bib17)\), but cannot express the fact that a*set*of sentences shares a semantic attribute simultaneously, which is precisely what a hyperedge represents\.

We proposeHyPE\(HypergraphPersonaEncoder\), a framework that treats persona\-bearing texts as*\(Core, Expression, Sentiment, Category\)*quadruples, and organizes persona elements into a*hypergraph*whose hyperedges group sentences by shared category labels\. A HyperGCN hypergraph neural networkYadatiet al\.\([2019](https://arxiv.org/html/2606.13142#bib.bib42)\)propagates this structure through a soft\-memory bridging module into the response generator\. We further introducePersistent Edge Embeddings \(PEE\): lightweight per\-category learnable priors \(≈\\approx1\.3K parameters\) fused into each HyperGCN message\-passing step, allowing the encoder to specialize how persona attributes from different semantic categories contribute to hyperedge aggregation\.

Our main contributions are: \(1\) a persona representation that pairs an*\(Core, Expression, Sentiment, Category\)*quadruple for well capturing affective and categorical signals; \(2\) a hypergraph construction procedure that induces category hyperedges, enabling many\-to\-many co\-membership modeling among persona elements; \(3\) an HyperGCN encoder bridged to the language model via an Encoder Soft\-Memory module that converts the context\-conditioned persona summary into a fixed\-size soft prompt; \(4\) Persistent Edge Embeddings \(PEE\) that inject category\-specific priors into hyperedge aggregation with minimal parameter overhead \(≈\\approx1\.3K\); and \(5\) comprehensive experiments on PersonaChat showing that HyPE consistently outperforms HyPE\-base across LLaMA\-3\.2\-3B, Qwen2\.5\-3B, and GPT\-2 backbones\.

## 2Related Work

### 2\.1Persona\-grounded Dialogue Generation

Persona\-grounded dialogue generation has progressed along two axes: how persona is*represented*and how it is*integrated*into the response generator\. Since PersonaChatZhanget al\.\([2018](https://arxiv.org/html/2606.13142#bib.bib1)\), the dominant representation is a flat set of 3\-5 persona sentences, integration methods such as BoBSonget al\.\([2021](https://arxiv.org/html/2606.13142#bib.bib3)\), ORIGChenet al\.\([2023a](https://arxiv.org/html/2606.13142#bib.bib4)\), LMEDRChenet al\.\([2023b](https://arxiv.org/html/2606.13142#bib.bib5)\), and CLVTanget al\.\([2023b](https://arxiv.org/html/2606.13142#bib.bib6)\)improve this via NLI supervision, ordering invariance, memory modules, and contrastive latent variables, respectively\. PeaCoKGaoet al\.\([2023](https://arxiv.org/html/2606.13142#bib.bib7)\)enriches the persona itself with a commonsense knowledge graph over five social dimensions, which we adopt as our category taxonomy \(Section[3\.2\.1](https://arxiv.org/html/2606.13142#S3.SS2.SSS1)\)\. However, all of these works treat the persona either as text or a pairwise graph;*many\-to\-many*co\-membership relations, which are the natural unit of persona semantics, remain unmodeled\.

### 2\.2Structured Persona Analysis and Aspect Quadruples

Our quadruple representation bridges two research lines:*structured persona extraction*and*aspect\-based sentiment analysis*\(ABSA\)\. GettingToKnowYouWuet al\.\([2020](https://arxiv.org/html/2606.13142#bib.bib8)\)extracts \(subject, relation, object\) triples from utterances, capturing factual structure but lacking affective or categorical signals\. On the ABSA side, ACOSCaiet al\.\([2021](https://arxiv.org/html/2606.13142#bib.bib10)\)extracts*\(Aspect, Category, Opinion, Sentiment\)*quadruples from reviews, and DiaASQLiet al\.\([2023](https://arxiv.org/html/2606.13142#bib.bib11)\)extracts*\(Target, Aspect, Opinion, Sentiment\)*quadruples from dialogues\. HyPE unifies both lines: we extend the persona triple with Sentiment and Category fields, then use the Category label to induce hyperedges that capture co\-membership dependencies among persona attributes\.

### 2\.3Hypergraph Neural Networks for NLP

Hypergraph neural networks generalize message passing by allowing each hyperedge to connect an arbitrary number of nodes, providing a natural inductive bias for co\-membership relationsFenget al\.\([2019](https://arxiv.org/html/2606.13142#bib.bib12)\); Yadatiet al\.\([2019](https://arxiv.org/html/2606.13142#bib.bib42)\)\. In language tasks, hypergraphs have recently been applied to emotion recognition in conversationZhenget al\.\([2024](https://arxiv.org/html/2606.13142#bib.bib14)\), node classification on text\-attributed hypergraphsBazagaet al\.\([2024](https://arxiv.org/html/2606.13142#bib.bib15)\), and table\-grounded question answering with LLMsHuanget al\.\([2025](https://arxiv.org/html/2606.13142#bib.bib16)\)\. Persona\-grounded dialogue*generation*, however, has remained outside this trend: existing graph\-augmented dialogue systems either operate over external knowledge graphs with pairwise edgesTanget al\.\([2023a](https://arxiv.org/html/2606.13142#bib.bib17)\)or use GNNs to encode flat persona sets\. To our knowledge, HyPE is the first system to model persona\-grounded response generation as message passing on a*persona hypergraph*, where hyperedges are induced by category labels rather than by external knowledge sources\.

## 3Methodology

![Refer to caption](https://arxiv.org/html/2606.13142v1/figure/figure1.png)Figure 1:HyPE framework: Persona Quadruple Extraction→\\toHypergraph Construction→\\toHyperGCN \+ PEE encoding→\\toEncoder Soft\-Memory injection into the response generator\.In this section, we present our proposed framework,HyPE, which performs persona\-grounded dialogue response generation through persona quadruple analysis and hypergraph expansion\. As shown in Figure[1](https://arxiv.org/html/2606.13142#S3.F1), the framework consists of three main stages: \(1\) Persona Analysis, \(2\) Persona Hypergraph Construction, and \(3\) Personalized Response Generation via a Hypergraph Neural Network \(HGNN\)\.

### 3\.1Overview

Given a set of speaker’s persona attribute sentencesP=\{p1,p2,…,pn\}P=\\\{p\_\{1\},p\_\{2\},\\dots,p\_\{n\}\\\}and a dialogue contextCC, the goal is to generate a responserrthat is consistent with the speaker’s persona\. Unlike prior work that relies solely on static persona sentences, our framework explicitly models the higher\-order relations among persona attributes by \(i\) decomposing each persona\-bearing sentence into a structured quadruple, and \(ii\) constructing a persona hypergraph in which shared category labels serve as hyperedges connecting related persona elements\. The structural information learned via the HGNN is then injected into a response generation model in the form of a persona summary vector and updated individual persona node embeddings\.

### 3\.2Persona Analysis

#### 3\.2\.1Persona Quadruple Definition

We define apersona quadrupleas a four\-element tuple\(Core, Expression, Sentiment, Category\)that captures the semantic structure of a persona\-bearing utterance:

- •Coredenotes the central attribute or target mentioned by the speaker\.
- •Expressiondenotes the speaker’s description or opinion about the Core\.
- •Sentimentdenotes the polarity of the persona information, taking one of three values:positive,negative, orneutral\.
- •Categorydenotes the persona attribute’s type, taking one of five values:Characteristic,Routine/Habit,Goal/Plan,Experience, orRelationship\. We adopt the categorization defined in PeaCoKGaoet al\.\([2023](https://arxiv.org/html/2606.13142#bib.bib7)\)\.

WhileCoreandExpressionpreserve the propositional content of the persona,SentimentandCategoryserve as auxiliary attributes that abstract the persona at a higher level;Categoryprovides the basis for hyperedge construction introduced in Section[3\.3](https://arxiv.org/html/2606.13142#S3.SS3)\.

#### 3\.2\.2Dataset Construction and Extraction Model

We build a quadruple\-annotated training set on top ofPGDatasetRibeiroet al\.\([2023](https://arxiv.org/html/2606.13142#bib.bib2)\), which aligns PersonaChat utterances with their grounding profile sentences\. We prompt GPT\-4o\-mini with the quadruple definitions and three few\-shot examples \(Appendix[A](https://arxiv.org/html/2606.13142#A1)\); malformed outputs are removed via rule\-based filtering\. Human verification on 200 sampled quadruples confirms 84\.5% exact\-match \(Fleissκ=0\.81\\kappa\{=\}0\.81Fleiss \([1971](https://arxiv.org/html/2606.13142#bib.bib36)\); Appendix[B](https://arxiv.org/html/2606.13142#A2)\)\.

We formulate extraction as sequence\-to\-sequence generation: given utteranceuu, a T5\-based modelRaffelet al\.\([2020](https://arxiv.org/html/2606.13142#bib.bib21)\)generates

y=\[Core\]c\[Expression\]e\[Sentiment\]s\[Category\]k\.\\begin\{split\}y=\\ &\\texttt\{\[Core\]\}\\,c\\,\\texttt\{\[Expression\]\}\\,e\\,\\\\ &\\texttt\{\[Sentiment\]\}\\,s\\,\\texttt\{\[Category\]\}\\,k\.\\end\{split\}\(1\)At inference, the extractor is applied uniformly to both utterances and profile sentences, so every persona\-bearing text is represented as a quadruple for hypergraph construction\.

### 3\.3Persona Hypergraph Construction

A hypergraph𝒢=\(𝒱,ℰ\)\\mathcal\{G\}=\(\\mathcal\{V\},\\mathcal\{E\}\)generalizes a graph by allowing each hyperedgee∈ℰe\\in\\mathcal\{E\}to connect an arbitrary number of nodesv∈𝒱v\\in\\mathcal\{V\}\. This makes hypergraphs particularly suitable for modeling persona information, where multiple persona statements can share a single high\-level attribute \(e\.g\., ahobby category\) that a pairwise edge cannot faithfully express\.

##### Nodes\.

The node set𝒱\\mathcal\{V\}contains two types of nodes: \(i\)Profile\-persona nodes, one per predefined persona sentence in the dataset; and \(ii\)Extracted\-persona nodes, derived from utterances, where each node is represented as the concatenation of its Core, Expression and Sentiment\.

Each node is represented with a quadruple obtained from the extraction model; the concatenated Core, Expression and Sentiment sequence is used as input of sentence\-transformer and the hidden vector of its CLS token is as the embedding of each node\. In addition, the Category field of the quadruple determines which hyperedge the node belongs to, allowing the two node types to be uniformly handled in the hypergraph despite their differing surface granularity\.

##### Hyperedges\.

The hyperedge setℰ\\mathcal\{E\}consists ofcategory hyperedges\(C:\{category\}\); each hyperedge groups all nodes that share the same PeaCoK category label from the quadruple \(one of five values:Characteristic,Routine/Habit,Goal/Plan,Experience,Relationship\)\.

A node belongs to exactly one category hyperedge, and nodes from different persona sentences are linked when they share the same category label for enabling many\-to\-many co\-membership modeling in a single structure\. Furthermore, HyPE introduces Persistent Edge Embeddings \(Section[3\.4](https://arxiv.org/html/2606.13142#S3.SS4)\), which inject per\-category learnable priors during category hyperedge aggregation\.

### 3\.4Hypergraph Neural Network

To propagate information over the persona hypergraph, we adoptHyperGCNYadatiet al\.\([2019](https://arxiv.org/html/2606.13142#bib.bib42)\), which generalizes graph convolution to hypergraphs by deriving a weighted graph from the hyperedge incidence structure and applying Laplacian\-based propagation\.

Message passing proceeds in three steps\. First, each nodevvtransforms its featurexvx\_\{v\}and sends a message to every hyperedgeeethat it belongs to\. Second, each hyperedgeeeaggregates incoming messages from its member nodes to produce an edge representationmem\_\{e\}\. Third, each nodevvreceives the aggregated messages from all its incident hyperedges, combines them with its current feature, and updates its representation\. Formally, one layer of update can be written as

me\\displaystyle m\_\{e\}=ϕ\(\{xv:v∈e\}\),\\displaystyle=\\phi\\\!\\left\(\\\{x\_\{v\}:v\\in e\\\}\\right\),\(2\)xv′\\displaystyle x\_\{v\}^\{\\prime\}=ψ\(xv,\{me:v∈e\}\),\\displaystyle=\\psi\\\!\\left\(x\_\{v\},\\,\\\{m\_\{e\}:v\\in e\\\}\\right\),\(3\)whereϕ\\phiandψ\\psiare learnable aggregation functions\. AfterLLlayers, the initial node embeddingsxv\(0\)x\_\{v\}^\{\(0\)\}are updated toxvupdx\_\{v\}^\{\\text\{upd\}\}, which encode both the node’s own semantics and the higher\-order relations induced by shared category labels\.

##### Persistent Edge Embeddings \(PEE\)\.

To further specialize hyperedge representations by semantic category, we introducePersistent Edge Embeddings \(PEE\): a small set of learnable vectors to represent each PeaCoK category label \(C:Characteristic,C:Experience,C:Goal/Plan,C:Relationship,C:Routine/Habit\)\. They are fused into the aggregation step of Category hyperedges\. Formally, let𝐞init\(t\)∈ℝdh\\mathbf\{e\}\_\{\\text\{init\}\}^\{\(t\)\}\\in\\mathbb\{R\}^\{d\_\{h\}\}be the persistent embedding for category hyperedge typett\. After node\-to\-edge message aggregation on aC:hyperedge, the edge representation is updated as:

me←EdgeFuse\(\[𝐞init\(t\);me\]\),m\_\{e\}\\leftarrow\\texttt\{EdgeFuse\}\\\!\\left\(\\left\[\\mathbf\{e\}\_\{\\text\{init\}\}^\{\(t\)\}\\,;\\,m\_\{e\}\\right\]\\right\),\(4\)whereEdgeFuseis a learned linear projection and\[⋅;⋅\]\[\\cdot;\\cdot\]denotes concatenation\. This introduces a category\-specific prior at every message\-passing step while adding only≈\\approx1\.3K additional parameters in total\. The persistent embeddings are optimized jointly with the rest of the model\.

### 3\.5Personalized Response Generation

#### 3\.5\.1Persona Information Injection

The structured persona information is injected into the response generation model in two complementary forms: apersona summary vectorandindividual persona node embeddings\.

##### Persona summary vector\.

We first encode the dialogue history with the generation model’s embedding layer and apply mean pooling to obtain a context vectorcctxc\_\{\\text\{ctx\}\}\. A linear projectionWQW\_\{Q\}mapscctxc\_\{\\text\{ctx\}\}to a query vector𝐪\\mathbf\{q\}that lives in the same space as the node embeddings\. Using allNNuser persona node embeddings𝐱upd\\mathbf\{x\}^\{\\text\{upd\}\}as both keys and values, we compute a context\-aware persona summary via scaled dot\-product attention:

Psumm=∑i=1Nsoftmax\(𝐪⋅K\[i\]⊤dk\)V\[i\]\.P\_\{\\text\{summ\}\}=\\sum\_\{i=1\}^\{N\}\\mathrm\{softmax\}\\\!\\left\(\\frac\{\\mathbf\{q\}\\cdot K\[i\]^\{\\top\}\}\{\\sqrt\{d\_\{k\}\}\}\\right\)V\[i\]\.\(5\)
The resulting vectorPsumm∈ℝdhP\_\{\\text\{summ\}\}\\in\\mathbb\{R\}^\{d\_\{h\}\}summarizes the speaker’s overall persona conditioned on the current dialogue context\.

##### Encoder Soft\-Memory\.

Directly prepending a single vector as a soft prompt is insufficient to bridge the gap between the graph embedding space and the continuous token space expected by the pre\-trained language model\. We therefore introduce theEncoder Soft\-Memorymodule that expandsPsummP\_\{\\text\{summ\}\}into a sequence ofMMsoft tokens via a two\-layer MLP:

𝐒~=reshape\(W2GELU\(W1Psumm\+b1\)\+b2\),\\tilde\{\\mathbf\{S\}\}=\\mathrm\{reshape\}\\\!\\left\(W\_\{2\}\\,\\mathrm\{GELU\}\(W\_\{1\}P\_\{\\text\{summ\}\}\+b\_\{1\}\)\+b\_\{2\}\\right\),\(6\)whereW1∈ℝdinter×dhW\_\{1\}\\in\\mathbb\{R\}^\{d\_\{\\text\{inter\}\}\\times d\_\{h\}\},W2∈ℝ\(M⋅dlm\)×dinterW\_\{2\}\\in\\mathbb\{R\}^\{\(M\\cdot d\_\{\\text\{lm\}\}\)\\times d\_\{\\text\{inter\}\}\}, andreshapeM×dlm\\mathrm\{reshape\}\_\{M\\times d\_\{\\text\{lm\}\}\}converts the resultingM⋅dlmM\{\\cdot\}d\_\{\\text\{lm\}\}\-dimensional vector into𝐒~∈ℝM×dlm\\tilde\{\\mathbf\{S\}\}\\in\\mathbb\{R\}^\{M\\times d\_\{\\text\{lm\}\}\}\. TheMMprojected vectors are prepended to the model input as a fixed\-size soft prompt regardless of the number of persona nodes\. We useM=15M\{=\}15slots in all experiments\.

##### Individual persona nodes \(Hyper\-tokens\)\.

While the Encoder Soft\-Memory provides a holistic, compressed persona summary, it loses the identity of individual persona facts after pooling\. To preserve fine\-grained, sentence\-level persona information, we additionally select the top\-kkmost salient updated node embeddings𝐱viupd\\mathbf\{x\}^\{\\text\{upd\}\}\_\{v\_\{i\}\}ranked by their attention weightwviw\_\{v\_\{i\}\}computed in the PersonaPooler step, retaining at mostk=8k\{=\}8nodes\. Each selected node embedding is then mapped to the LM hidden dimension via aHyperProjector, a single linear layer with Xavier uniform initialization:

𝐩i=𝐖hyp𝐱viupd\+𝐛,i=1,…,k,\\mathbf\{p\}\_\{i\}=\\mathbf\{W\}\_\{\\text\{hyp\}\}\\,\\mathbf\{x\}^\{\\text\{upd\}\}\_\{v\_\{i\}\}\+\\mathbf\{b\},\\quad i=1,\\ldots,k,\(7\)where𝐖hyp∈ℝdlm×dh\\mathbf\{W\}\_\{\\text\{hyp\}\}\\in\\mathbb\{R\}^\{d\_\{\\text\{lm\}\}\\times d\_\{h\}\}and𝐛∈ℝdlm\\mathbf\{b\}\\in\\mathbb\{R\}^\{d\_\{\\text\{lm\}\}\}\. We use a single linear layer rather than a deeper MLP to avoid overfitting on the small number of top\-kknode vectors\. The projected hyper\-tokens\[𝐩1,…,𝐩k\]\[\\mathbf\{p\}\_\{1\},\\ldots,\\mathbf\{p\}\_\{k\}\]are concatenated immediately after the soft\-memory tokens, allowing the language model to attend selectively to individual persona facts during generation\. This two\-stream design—soft\-memory for global persona context and hyper\-tokens for local persona facts—enables the model to leverage both summarized and fine\-grained persona signals simultaneously\.

##### Final input\.

The final input embedding sequence to the generation model is organized as

\[𝐒⏟~Mvectors;Pnodes⏟≤ktokens;Ctx⏟dialogue context\],\[\\,\\underbrace\{\\tilde\{\\mathbf\{S\}\}\}\_\{M\\ \\text\{vectors\}\}\\,;\\ \\underbrace\{P\_\{\\text\{nodes\}\}\}\_\{\\leq k\\ \\text\{tokens\}\}\\,;\\ \\underbrace\{Ctx\}\_\{\\text\{dialogue context\}\}\\,\],\(8\)where𝐒~=\[𝐬~1,…,𝐬~M\]\\tilde\{\\mathbf\{S\}\}=\[\\tilde\{\\mathbf\{s\}\}\_\{1\},\\ldots,\\tilde\{\\mathbf\{s\}\}\_\{M\}\]denotes theMMsoft\-memory vectors,PnodesP\_\{\\text\{nodes\}\}the top\-kkprojected hyper\-tokens, andCtxCtxthe dialogue context embeddings\.

#### 3\.5\.2Training Strategy

The model is trained with a language modeling objective\. Letℐ=\(𝐒~,Pnodes,Ctx\)\\mathcal\{I\}=\(\\tilde\{\\mathbf\{S\}\},P\_\{\\text\{nodes\}\},Ctx\)denote the full conditioning input, where𝐒~\\tilde\{\\mathbf\{S\}\}are the soft\-memory tokens,PnodesP\_\{\\text\{nodes\}\}the projected persona node embeddings, andCtxCtxthe dialogue context:

ℒ=ℒLM=−1\|Tresp\|∑t∈Tresplog⁡P\(yt∣ℐ;θ\),\\mathcal\{L\}=\\mathcal\{L\}\_\{\\text\{LM\}\}=\-\\frac\{1\}\{\|T\_\{\\text\{resp\}\}\|\}\\sum\_\{t\\in T\_\{\\text\{resp\}\}\}\\log P\(y\_\{t\}\\mid\\mathcal\{I\};\\theta\),\(9\)whereTrespT\_\{\\text\{resp\}\}denotes the response\-token indices\. We additionally explored optional InfoNCE contrastive extensionsvan den Oordet al\.\([2018](https://arxiv.org/html/2606.13142#bib.bib29)\)that cluster persona nodes by shared hyperedge membership \(ℒlabel\\mathcal\{L\}\_\{\\text\{label\}\}\) and align the context query to the most response\-relevant persona node \(ℒpg\\mathcal\{L\}\_\{\\text\{pg\}\}\); however, these do not improve overℒLM\\mathcal\{L\}\_\{\\text\{LM\}\}alone \(HyPE\+rel\{\}\_\{\\text\{\+rel\}\}: B\-1 16\.54 vs\. HyPE: 17\.94\) and are excluded from our primary model\.

##### Optimization\.

We use AdamW with two parameter groups \(backbone and new modules\) and a 5% linear warmup schedule\. Hyperparameters per backbone are in Section[4\.1](https://arxiv.org/html/2606.13142#S4.SS1)\.

## 4Experiments

### 4\.1Experimental Setup

##### Dataset\.

We conduct all experiments onPersonaChatZhanget al\.\([2018](https://arxiv.org/html/2606.13142#bib.bib1)\), a widely\-used benchmark for persona\-grounded dialogue\. Each dialogue is paired with 3\-5 first\-person persona sentences describing one speaker, and crowdworkers enact conversations conditioned on those personas\. We use the standard split: 8,939 training / 1,000 validation / 968 test dialogues, yielding 131,438 / 7,801 turn\-level training samples\. We evaluate on the 968 test dialogues using one final\-turn prediction per dialogue\.

##### Baselines\.

We compareHyPEagainst four categories of baselines\.

Backbone\-only baselinescondition the language model directly on persona text without structural modeling: \(i\)Text Baselineconcatenates persona sentences as a textual prefix to the dialogue history; \(ii\)Quad\-Text Baseline\(GPT\-2 only\) replaces each persona sentence with its extracted quadruple \(\[Core\] c \[Expr\] e \[Sent\] s \[Cat\] k\), testing whether structural annotations help as raw text input; \(iii\)MeanPool Baseline\(LLaMA/Qwen backbones\) encodes each persona sentence with Sentence\-BERTReimers and Gurevych \([2019](https://arxiv.org/html/2606.13142#bib.bib26)\), mean\-pools the embeddings, and prepends the result as a soft prompt, testing whether sentence\-level pooling alone is sufficient\.

Structural baseline:GCN BaselineKipf and Welling \([2017](https://arxiv.org/html/2606.13142#bib.bib25)\)builds a pairwise graph over persona sentences with edges connecting sentences that share the same sentiment or category label, and encodes node features with a 2\-layer GCN before injecting them as soft prompts\. This directly tests whether*hyperedge*\-based modeling outperforms*pairwise*graph encoding under an otherwise identical pipeline\.

Persona\-specific prior work:ORIGChenet al\.\([2023a](https://arxiv.org/html/2606.13142#bib.bib4)\)is an order\-insensitive generation framework that regularizes responses to be invariant to persona\-sentence ordering via a contrastive training objective\. All ORIG results are reported under greedy decoding for a fair comparison\.

##### Implementation Details\.

All models are trained on4×4\{\\times\}NVIDIA RTX A6000 GPUs\.

GPT\-2 backbone\.We use GPT\-2 small \(124M\)Radfordet al\.\([2019](https://arxiv.org/html/2606.13142#bib.bib20)\)with full fine\-tuning\. Persona node features are initialized fromall\-MiniLM\-L6\-v2Sentence\-BERT embeddings \(384\-d\) and projected to GPT\-2 hidden size \(768\) via a linear hyper\-projector\. The HyperGCN encoder usesL=1L\{=\}1message\-passing layer with hidden dimensiondh=256d\_\{h\}\{=\}256\. The soft\-memory module hasM=15M\{=\}15learnable slots, and the top\-kkhyper\-token selector retains at most 8 tokens\. Training: 30 epochs, early stopping \(patience 3\), per\-device batch 32 \(effective batch 128 across 4 GPUs\), backbone LR2×10−52\\\!\\times\\\!10^\{\-5\}, new\-module LR1×10−41\\\!\\times\\\!10^\{\-4\}, AdamW with 5% linear warmup, weight decay 0\.01, full\-precision \(float32\)\.

LLaMA and Qwen backbones\.We use Llama\-3\.2\-3B\-InstructGrattafioriet al\.\([2024](https://arxiv.org/html/2606.13142#bib.bib22)\)and Qwen2\.5\-3B\-InstructQwenet al\.\([2025](https://arxiv.org/html/2606.13142#bib.bib40)\)with LoRA adaptersHuet al\.\([2022](https://arxiv.org/html/2606.13142#bib.bib23)\)of rankr=16r\{=\}16,α=32\\alpha\{=\}32, dropout 0\.05 applied to \{qq,kk,vv,oo\} projections\. Training: 30 epochs, early stopping \(patience 3\), per\-device batch 4, gradient accumulation×\\times8 \(effective 128\), backbone LR1×10−41\\\!\\times\\\!10^\{\-4\}, new\-module LR3×10−43\\\!\\times\\\!10^\{\-4\}, mixed\-precisionbfloat16\. Hypergraph hyper\-parameters are identical to the GPT\-2 setting\.

Contrastive loss \(optional variants\)\.InfoNCE temperatureτ=0\.1\\tau\{=\}0\.1; contrastive weightα=0\.1\\alpha\{=\}0\.1when enabled\.

Inference\.All results use greedy decoding \(num\_beams=1\\text\{num\\\_beams\}\{=\}1,do\_sample=False\) with a maximum of 64 new tokens\.

### 4\.2Evaluation Metrics

We evaluate along two complementary dimensions\.

##### Automatic metrics\.

We report BLEU\-1/2/4Papineniet al\.\([2002](https://arxiv.org/html/2606.13142#bib.bib30)\), ROUGE\-LLin \([2004](https://arxiv.org/html/2606.13142#bib.bib31)\), and METEORBanerjee and Lavie \([2005](https://arxiv.org/html/2606.13142#bib.bib32)\)as surface overlap metrics against reference responses\. BLEU\-1 measures unigram precision; BLEU\-4 captures 4\-gram fluency; ROUGE\-L reflects the longest common subsequence; METEOR additionally accounts for stemming and synonym matching\.

##### LLM\-as\-a\-Judge\.

To complement lexical overlap, we use GPT\-4oOpenAIet al\.\([2024](https://arxiv.org/html/2606.13142#bib.bib38)\)as a judgeLiuet al\.\([2023](https://arxiv.org/html/2606.13142#bib.bib39)\)to score responses on three dimensions:Persona Consistency,Engagingness, andRelevance\(each 1\-5\)\. Results are reported in Section[4\.6](https://arxiv.org/html/2606.13142#S4.SS6)\.

### 4\.3Main Results

Table[1](https://arxiv.org/html/2606.13142#S4.T1)presents cross\-backbone results on PersonaChat under greedy decoding, comparing HyPE against all baselines on LLaMA\-3\.2\-3B, Qwen2\.5\-3B, and GPT\-2\.

Table 1:Cross\-backbone results on PersonaChat \(greedy,×\\times100\)\.Boldper block: best per column\. MTR: METEOR\.HyPE consistently outperforms HyPE\-base by\+1\.32\+1\.32,\+0\.93\+0\.93, and\+0\.89\+0\.89BLEU\-1 on LLaMA, Qwen, and GPT\-2, respectively\. Paired bootstrap significance tests \(B=2000, corpus BLEU\-1\)Efron and Tibshirani \([1994](https://arxiv.org/html/2606.13142#bib.bib41)\)confirm that the LLaMA gain is statistically significant \(p<0\.05p\{<\}0\.05\), while Qwen and GPT\-2 gains are directionally consistent across all metrics, confirming that PEE provides a transferable structural advantage that is not specific to any single language model\.

On LLaMA\-3\.2\-3B, HyPE attains the best score on every automatic metric among all compared systems, leading the strongest baseline \(MeanPool\) on ROUGE\-L \(17\.50 vs\. 17\.20\) and METEOR \(15\.92 vs\. 14\.82\), and surpassing the persona\-specific ORIG baseline across every metric \(e\.g\.,\+1\.13\+1\.13BLEU\-1 and\+0\.39\+0\.39ROUGE\-L\)\. On Qwen2\.5\-3B, HyPE achieves the best B\-4, ROUGE\-L, and METEOR; the Text Baseline edges it on BLEU\-1/2 \(21\.64 / 10\.24 vs\. 22\.18 / 10\.50\), but trails HyPE on the higher\-order and overlap metrics\. These results indicate that the advantage of hyperedge\-level encoding is not confined to a single backbone or to generic pooling baselines\.

### 4\.4Structural Baseline Comparison

We next isolate the comparison against the pairwiseGCNbaseline, which shares HyPE’s soft\-prompt injection pipeline but encodes persona sentences with pairwise edges rather than category hyperedges\.

On GPT\-2, HyPE achieves the highest BLEU\-1 \(17\.94\), outperforming the GCN baseline \(\+2\.77\+2\.77\) and HyPE\-base \(\+0\.89\+0\.89\), demonstrating that hyperedge\-level encoding provides value over both pairwise graph methods and vanilla message passing\. The same ordering holds on LLaMA and Qwen, where HyPE leads the GCN baseline by\+1\.21\+1\.21\(22\.64 vs\. 21\.43\) and\+1\.28\+1\.28\(22\.18 vs\. 20\.90\) BLEU\-1 respectively, confirming that the hyperedge advantage over pairwise graph encoding transfers across backbone scales\.

The Text Baseline scores substantially below all structural methods on GPT\-2 \(12\.63 B\-1\)\.

We also evaluated a contrastive variant \(HyPE\+rel\{\}\_\{\\text\{\+rel\}\}: B\-1 16\.54\), finding that contrastive supervision alone hurts B\-1 — PEE is the key component that drives improvement\.

HyPE outperforms ORIG on BLEU\-1/2/4, ROUGE\-L, and METEOR on LLaMA \(e\.g\., ROUGE\-L 17\.50 vs\. 17\.11\), despite ORIG producing shorter responses \(9\.8 words\) closer to the gold reference length \(11\.0 words\) than ours \(12\.4 words\)\. This indicates HyPE’s gains are not an artifact of response length\.

### 4\.5Ablation Study

We ablate the three core modules of HyPE on the GPT\-2 backbone to quantify each component’s contribution\. Figure[2](https://arxiv.org/html/2606.13142#S4.F2)visualizes the BLEU\-1 of each variant; the full four\-metric breakdown is reported in Appendix[C](https://arxiv.org/html/2606.13142#A3)\(Table[4](https://arxiv.org/html/2606.13142#A3.T4)\)\.

055101015152020HyPEHyPE\-basew/o Hyper\-tokensw/o HyperGCNCA\-MeanPoolw/o Soft\-Memory17\.9417\.9417\.0517\.0516\.3416\.3416\.3216\.3215\.6715\.675\.225\.22BLEU\-1 \(×\\times100\)Figure 2:Ablation on GPT\-2 \(greedy, BLEU\-1×\\times100\)\. Removing the Soft\-Memory module collapses performance to near the Text Baseline, while PEE \(HyPE vs\. HyPE\-base\) and each structural component contribute smaller, complementary gains\. CA\-MeanPool applies per\-category S\-BERT offsets without message\-passing\. Full metrics in Table[4](https://arxiv.org/html/2606.13142#A3.T4)\.Under greedy decoding, the results form a clear hierarchy: HyPE \(17\.94\)\>\>HyPE\-base \(17\.05\)\>\>w/o Hyper\-tokens≈\\approxw/o HyperGCN≫\\ggw/o Soft\-Memory\. PEE adds\+0\.89\+0\.89B\-1 over HyPE\-base at a cost of≈\\approx1\.3K additional parameters, confirming that category\-specific edge priors are complementary to the core architecture\.

##### Necessity of hyperedge message\-passing for PEE\.

To test whether PEE’s benefit stems solely from having category\-specific parameters \(independently of the hypergraph\), we introduce aCA\-MeanPoolcontrol: per\-category learnable offset vectors are added to S\-BERT node embeddings*before*mean pooling, with no HyperGCN message\-passing\. CA\-MeanPool scores 15\.67 B\-1,*lower*than HyPE\-base \(17\.05\)\. This result confirms that category\-specific parameters are not beneficial on their own\-they require the context of hyperedge message\-passing to become useful\. The PEE improvement therefore cannot be attributed to mere category awareness; it is specifically enabled by the hyperedge propagation that PEE modulates\.

##### Effect of HyperGCN\.

Removing the HyperGCN message\-passing layer \(w/o HyperGCN\) reduces BLEU\-1 by 0\.73 \(17\.05→\\to16\.32\) and ROUGE\-L by 0\.30, confirming that hypergraph propagation provides consistent improvements in content fidelity\.

##### Effect of Hyper\-tokens\.

Removing the top\-kkhyper\-token selector \(w/o Hyper\-tokens\) reduces BLEU\-1 by 0\.71 \(17\.05→\\to16\.34\) and ROUGE\-L by 0\.23\. Multi\-token conditioning provides complementary value to message\-passing: selecting the most salient hyperedge embeddings as discrete token inputs helps the decoder attend to specific persona attributes\.

##### Effect of Soft\-Memory\.

Disabling the EncoderSoftMemory module \(w/o Soft\-Memory\) causes a catastrophic drop: BLEU\-1 falls by 11\.83 points \(17\.05→\\to5\.22\) and ROUGE\-L by 6\.66 points \(14\.16→\\to7\.50\), collapsing to near\-text\-baseline performance\. This identifies the soft\-memory slot layer as the critical bridge mapping discrete hypergraph embeddings into the continuous soft\-token space the GPT\-2 decoder expects\.

### 4\.6LLM\-as\-a\-Judge Evaluation \(G\-eval\)

To complement the lexical overlap metrics, we evaluate a subset of models using GPT\-4oOpenAIet al\.\([2024](https://arxiv.org/html/2606.13142#bib.bib38)\)as a judgeLiuet al\.\([2023](https://arxiv.org/html/2606.13142#bib.bib39)\)on three dimensions:Persona Consistency\(does the response reflect the speaker’s persona?\),Engagingness\(does it invite further conversation?\), andRelevance\(does it address the preceding turn?\)\. Each dimension is scored 1\-5 by the judge given the persona sentences, the last three dialogue turns, and the generated response\. We evaluate on 200 randomly sampled test items \(seed 42\)\.

Table 2:G\-eval \(GPT\-4o judge, 200 samples, scale 1\-5\)\.Bold: best per block\. P\.Cons: Persona Consistency; Engage: Engagingness; Relev: Relevance\.HyPE consistently outperforms HyPE\-base on Persona Consistency and Relevance across all three backbones, with the clearest gains on GPT\-2 on LLaMA and Qwen the differences are smaller but directionally consistent\. HyPE also exceeds the MeanPool baseline on every dimension and backbone\. On GPT\-2, the persona\-specific ORIG baseline trails HyPE on all three dimensions, most notably on Engagingness, indicating that hyperedge\-level persona structuring yields responses judged more consistent, engaging, and relevant\.

## 5Conclusion

We presentedHyPE, a framework for persona\-grounded dialogue generation that represents persona\-bearing texts as structured*\(Core, Expression, Sentiment, Category\)*quadruples and organizes them into a hypergraph whose hyperedges encode shared category labels\. An HyperGCN encoder propagates this structure through a soft\-memory bridging module into the response generator\. We additionally introducedPersistent Edge Embeddings \(PEE\), per\-category learnable priors fused into the hyperedge aggregation step, which consistently improve content fidelity with negligible additional parameters \(≈\\approx1\.3K\)\.

Experiments on PersonaChat demonstrate that HyPE achieves the best BLEU\-1 among greedy\-decoded systems across all three backbone scales \(LLaMA\-3\.2\-3B, Qwen2\.5\-3B, GPT\-2\), consistently outperforming HyPE\-base by up to 1\.32 BLEU\-1 on LLaMA\. Ablation studies confirm the indispensability of the soft\-memory module and the complementary roles of HyperGCN message passing and the top\-kkhyper\-token selector\. These results demonstrate that explicit hyperedge\-level persona structuring, combined with lightweight category priors, provides a transferable advantage that scales across backbone capacities without contrastive training overhead\.

## Limitations

Our experiments are conducted on PersonaChat, a single English\-language benchmark with relatively short persona descriptions \(3\-5 sentences\)\. Generalization to longer persona profiles, other languages, or dialogue domains \(e\.g\., Multi\-Session Chat\) remains to be evaluated\. The quadruple extractor introduces a dependency on the OpenAI API for dataset construction; future work should explore open\-source alternatives\. We also do not evaluate extraction quality on a held\-out test set; the T5 extractor’s performance is validated indirectly through downstream generation quality and the annotation agreement reported in Appendix[B](https://arxiv.org/html/2606.13142#A2)\. We report single\-seed results due to computational constraints; multi\-seed evaluation with variance estimates is recommended for future work, particularly given that GPT\-2 gains over HyPE\-base are modest in absolute terms\. The G\-eval experiment uses GPT\-4o as judge on 200 randomly sampled items; scores may be sensitive to prompt phrasing and the choice of judge model\.

## Ethical Considerations

Our work on persona\-grounded dialogue generation \(HyPE\) utilizes the publicly available PersonaChat dataset, which consists of crowdsourced, synthetic personas\. Therefore, our current experiments do not involve the extraction or generation of real users’ personally identifiable information \(PII\)\. However, deploying such personalized systems in real\-world applications necessitates strict data privacy safeguards, as extracting fine\-grained quadruples \(Core, Expression, Sentiment, Category\) from live user interactions could potentially expose sensitive personal data\.

Additionally, our framework relies on large language models \(e\.g\., LLaMA\-3\.2, Qwen2\.5\) for response generation and proprietary APIs \(GPT\-4o\-mini\) for data annotation\. These models carry inherent risks of generating hallucinated, toxic, or biased content\. While explicit persona grounding helps narrow the generation space and improves response consistency, it may inadvertently amplify specific social biases if the assigned persona contains stereotyped attributes\. Future real\-world deployments must incorporate robust safety filters and comprehensive fairness evaluations to ensure objective and unbiased interactions\.

## References

- S\. Banerjee and A\. Lavie \(2005\)METEOR: an automatic metric for MT evaluation with improved correlation with human judgments\.InProceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization,J\. Goldstein, A\. Lavie, C\. Lin, and C\. Voss \(Eds\.\),Ann Arbor, Michigan,pp\. 65–72\.External Links:[Link](https://aclanthology.org/W05-0909/)Cited by:[§4\.2](https://arxiv.org/html/2606.13142#S4.SS2.SSS0.Px1.p1.1)\.
- A\. Bazaga, P\. Liò, and G\. Micklem \(2024\)HyperBERT: mixing hypergraph\-aware layers with language models for node classification on text\-attributed hypergraphs\.External Links:2402\.07309,[Link](https://arxiv.org/abs/2402.07309)Cited by:[§2\.3](https://arxiv.org/html/2606.13142#S2.SS3.p1.1)\.
- H\. Cai, R\. Xia, and J\. Yu \(2021\)Aspect\-category\-opinion\-sentiment quadruple extraction with implicit aspects and opinions\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics \(ACL\),Cited by:[§2\.2](https://arxiv.org/html/2606.13142#S2.SS2.p1.1)\.
- L\. Chen, H\. Wang, Y\. Deng, W\. C\. Kwan, Z\. Wang, and K\. Wong \(2023a\)Towards robust personalized dialogue generation via order\-insensitive representation regularization\.InFindings of the Association for Computational Linguistics: ACL 2023,pp\. 7337–7345\.External Links:[Link](https://aclanthology.org/2023.findings-acl.462/),[Document](https://dx.doi.org/10.18653/v1/2023.findings-acl.462)Cited by:[§2\.1](https://arxiv.org/html/2606.13142#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px2.p4.1)\.
- R\. Chen, J\. Wang, L\. Yu, and X\. Zhang \(2023b\)Learning to memorize entailment and discourse relations for persona\-consistent dialogues\.InProceedings of the 37th AAAI Conference on Artificial Intelligence \(AAAI\),Cited by:[§2\.1](https://arxiv.org/html/2606.13142#S2.SS1.p1.1)\.
- B\. Efron and R\. J\. Tibshirani \(1994\)An introduction to the bootstrap\.CRC Press\.Cited by:[§4\.3](https://arxiv.org/html/2606.13142#S4.SS3.p2.4)\.
- Y\. Feng, H\. You, Z\. Zhang, R\. Ji, and Y\. Gao \(2019\)Hypergraph neural networks\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.33,pp\. 3558–3565\.Cited by:[§2\.3](https://arxiv.org/html/2606.13142#S2.SS3.p1.1)\.
- J\. L\. Fleiss \(1971\)Measuring nominal scale agreement among many raters\.Psychological Bulletin76\(5\),pp\. 378–382\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.13142#S3.SS2.SSS2.p1.1)\.
- S\. Gao, B\. Borges, S\. Oh, D\. Bayazit, S\. Kanno, H\. Wakaki, Y\. Mitsufuji, and A\. Bosselut \(2023\)PeaCoK: persona commonsense knowledge for consistent and engaging narratives\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(ACL\),Cited by:[§2\.1](https://arxiv.org/html/2606.13142#S2.SS1.p1.1),[4th item](https://arxiv.org/html/2606.13142#S3.I1.i4.p1.1)\.
- A\. Grattafiori, A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Vaughan, A\. Yang, A\. Fan, A\. Goyal, A\. Hartshorn, A\. Yang, A\. Mitra, A\. Sravankumar, A\. Korenev, A\. Hinsvark, A\. Rao, A\. Zhang, A\. Rodriguez, A\. Gregerson, A\. Spataru, B\. Roziere, B\. Biron, B\. Tang, B\. Chern, C\. Caucheteux, C\. Nayak, C\. Bi, C\. Marra, C\. McConnell, C\. Keller, C\. Touret, C\. Wu, C\. Wong, C\. C\. Ferrer, C\. Nikolaidis, D\. Allonsius, D\. Song, D\. Pintz, D\. Livshits, D\. Wyatt, D\. Esiobu, D\. Choudhary, D\. Mahajan, D\. Garcia\-Olano, D\. Perino, D\. Hupkes, E\. Lakomkin, E\. AlBadawy, E\. Lobanova, E\. Dinan, E\. M\. Smith, F\. Radenovic, F\. Guzmán, F\. Zhang, G\. Synnaeve, G\. Lee, G\. L\. Anderson, G\. Thattai, G\. Nail, G\. Mialon, G\. Pang, G\. Cucurell, H\. Nguyen, H\. Korevaar, H\. Xu, H\. Touvron, I\. Zarov, I\. A\. Ibarra, I\. Kloumann, I\. Misra, I\. Evtimov, J\. Zhang, J\. Copet, J\. Lee, J\. Geffert, J\. Vranes, J\. Park, J\. Mahadeokar, J\. Shah, J\. van der Linde, J\. Billock, J\. Hong, J\. Lee, J\. Fu, J\. Chi, J\. Huang, J\. Liu, J\. Wang, J\. Yu, J\. Bitton, J\. Spisak, J\. Park, J\. Rocca, J\. Johnstun, J\. Saxe, J\. Jia, K\. V\. Alwala, K\. Prasad, K\. Upasani, K\. Plawiak, K\. Li, K\. Heafield, K\. Stone, K\. El\-Arini, K\. Iyer, K\. Malik, K\. Chiu, K\. Bhalla, K\. Lakhotia, L\. Rantala\-Yeary, L\. van der Maaten, L\. Chen, L\. Tan, L\. Jenkins, L\. Martin, L\. Madaan, L\. Malo, L\. Blecher, L\. Landzaat, L\. de Oliveira, M\. Muzzi, M\. Pasupuleti, M\. Singh, M\. Paluri, M\. Kardas, M\. Tsimpoukelli, M\. Oldham, M\. Rita, M\. Pavlova, M\. Kambadur, M\. Lewis, M\. Si, M\. K\. Singh, M\. Hassan, N\. Goyal, N\. Torabi, N\. Bashlykov, N\. Bogoychev, N\. Chatterji, N\. Zhang, O\. Duchenne, O\. Çelebi, P\. Alrassy, P\. Zhang, P\. Li, P\. Vasic, P\. Weng, P\. Bhargava, P\. Dubal, P\. Krishnan, P\. S\. Koura, P\. Xu, Q\. He, Q\. Dong, R\. Srinivasan, R\. Ganapathy, R\. Calderer, R\. S\. Cabral, R\. Stojnic, R\. Raileanu, R\. Maheswari, R\. Girdhar, R\. Patel, R\. Sauvestre, R\. Polidoro, R\. Sumbaly, R\. Taylor, R\. Silva, R\. Hou, R\. Wang, S\. Hosseini, S\. Chennabasappa, S\. Singh, S\. Bell, S\. S\. Kim, S\. Edunov, S\. Nie, S\. Narang, S\. Raparthy, S\. Shen, S\. Wan, S\. Bhosale, S\. Zhang, S\. Vandenhende, S\. Batra, S\. Whitman, S\. Sootla, S\. Collot, S\. Gururangan, S\. Borodinsky, T\. Herman, T\. Fowler, T\. Sheasha, T\. Georgiou, T\. Scialom, T\. Speckbacher, T\. Mihaylov, T\. Xiao, U\. Karn, V\. Goswami, V\. Gupta, V\. Ramanathan, V\. Kerkez, V\. Gonguet, V\. Do, V\. Vogeti, V\. Albiero, V\. Petrovic, W\. Chu, W\. Xiong, W\. Fu, W\. Meers, X\. Martinet, X\. Wang, X\. Wang, X\. E\. Tan, X\. Xia, X\. Xie, X\. Jia, X\. Wang, Y\. Goldschlag, Y\. Gaur, Y\. Babaei, Y\. Wen, Y\. Song, Y\. Zhang, Y\. Li, Y\. Mao, Z\. D\. Coudert, Z\. Yan, Z\. Chen, Z\. Papakipos, A\. Singh, A\. Srivastava, A\. Jain, A\. Kelsey, A\. Shajnfeld, A\. Gangidi, A\. Victoria, A\. Goldstand, A\. Menon, A\. Sharma, A\. Boesenberg, A\. Baevski, A\. Feinstein, A\. Kallet, A\. Sangani, A\. Teo, A\. Yunus, A\. Lupu, A\. Alvarado, A\. Caples, A\. Gu, A\. Ho, A\. Poulton, A\. Ryan, A\. Ramchandani, A\. Dong, A\. Franco, A\. Goyal, A\. Saraf, A\. Chowdhury, A\. Gabriel, A\. Bharambe, A\. Eisenman, A\. Yazdan, B\. James, B\. Maurer, B\. Leonhardi, B\. Huang, B\. Loyd, B\. D\. Paola, B\. Paranjape, B\. Liu, B\. Wu, B\. Ni, B\. Hancock, B\. Wasti, B\. Spence, B\. Stojkovic, B\. Gamido, B\. Montalvo, C\. Parker, C\. Burton, C\. Mejia, C\. Liu, C\. Wang, C\. Kim, C\. Zhou, C\. Hu, C\. Chu, C\. Cai, C\. Tindal, C\. Feichtenhofer, C\. Gao, D\. Civin, D\. Beaty, D\. Kreymer, D\. Li, D\. Adkins, D\. Xu, D\. Testuggine, D\. David, D\. Parikh, D\. Liskovich, D\. Foss, D\. Wang, D\. Le, D\. Holland, E\. Dowling, E\. Jamil, E\. Montgomery, E\. Presani, E\. Hahn, E\. Wood, E\. Le, E\. Brinkman, E\. Arcaute, E\. Dunbar, E\. Smothers, F\. Sun, F\. Kreuk, F\. Tian, F\. Kokkinos, F\. Ozgenel, F\. Caggioni, F\. Kanayet, F\. Seide, G\. M\. Florez, G\. Schwarz, G\. Badeer, G\. Swee, G\. Halpern, G\. Herman, G\. Sizov, Guangyi, Zhang, G\. Lakshminarayanan, H\. Inan, H\. Shojanazeri, H\. Zou, H\. Wang, H\. Zha, H\. Habeeb, H\. Rudolph, H\. Suk, H\. Aspegren, H\. Goldman, H\. Zhan, I\. Damlaj, I\. Molybog, I\. Tufanov, I\. Leontiadis, I\. Veliche, I\. Gat, J\. Weissman, J\. Geboski, J\. Kohli, J\. Lam, J\. Asher, J\. Gaya, J\. Marcus, J\. Tang, J\. Chan, J\. Zhen, J\. Reizenstein, J\. Teboul, J\. Zhong, J\. Jin, J\. Yang, J\. Cummings, J\. Carvill, J\. Shepard, J\. McPhie, J\. Torres, J\. Ginsburg, J\. Wang, K\. Wu, K\. H\. U, K\. Saxena, K\. Khandelwal, K\. Zand, K\. Matosich, K\. Veeraraghavan, K\. Michelena, K\. Li, K\. Jagadeesh, K\. Huang, K\. Chawla, K\. Huang, L\. Chen, L\. Garg, L\. A, L\. Silva, L\. Bell, L\. Zhang, L\. Guo, L\. Yu, L\. Moshkovich, L\. Wehrstedt, M\. Khabsa, M\. Avalani, M\. Bhatt, M\. Mankus, M\. Hasson, M\. Lennie, M\. Reso, M\. Groshev, M\. Naumov, M\. Lathi, M\. Keneally, M\. Liu, M\. L\. Seltzer, M\. Valko, M\. Restrepo, M\. Patel, M\. Vyatskov, M\. Samvelyan, M\. Clark, M\. Macey, M\. Wang, M\. J\. Hermoso, M\. Metanat, M\. Rastegari, M\. Bansal, N\. Santhanam, N\. Parks, N\. White, N\. Bawa, N\. Singhal, N\. Egebo, N\. Usunier, N\. Mehta, N\. P\. Laptev, N\. Dong, N\. Cheng, O\. Chernoguz, O\. Hart, O\. Salpekar, O\. Kalinli, P\. Kent, P\. Parekh, P\. Saab, P\. Balaji, P\. Rittner, P\. Bontrager, P\. Roux, P\. Dollar, P\. Zvyagina, P\. Ratanchandani, P\. Yuvraj, Q\. Liang, R\. Alao, R\. Rodriguez, R\. Ayub, R\. Murthy, R\. Nayani, R\. Mitra, R\. Parthasarathy, R\. Li, R\. Hogan, R\. Battey, R\. Wang, R\. Howes, R\. Rinott, S\. Mehta, S\. Siby, S\. J\. Bondu, S\. Datta, S\. Chugh, S\. Hunt, S\. Dhillon, S\. Sidorov, S\. Pan, S\. Mahajan, S\. Verma, S\. Yamamoto, S\. Ramaswamy, S\. Lindsay, S\. Lindsay, S\. Feng, S\. Lin, S\. C\. Zha, S\. Patil, S\. Shankar, S\. Zhang, S\. Zhang, S\. Wang, S\. Agarwal, S\. Sajuyigbe, S\. Chintala, S\. Max, S\. Chen, S\. Kehoe, S\. Satterfield, S\. Govindaprasad, S\. Gupta, S\. Deng, S\. Cho, S\. Virk, S\. Subramanian, S\. Choudhury, S\. Goldman, T\. Remez, T\. Glaser, T\. Best, T\. Koehler, T\. Robinson, T\. Li, T\. Zhang, T\. Matthews, T\. Chou, T\. Shaked, V\. Vontimitta, V\. Ajayi, V\. Montanez, V\. Mohan, V\. S\. Kumar, V\. Mangla, V\. Ionescu, V\. Poenaru, V\. T\. Mihailescu, V\. Ivanov, W\. Li, W\. Wang, W\. Jiang, W\. Bouaziz, W\. Constable, X\. Tang, X\. Wu, X\. Wang, X\. Wu, X\. Gao, Y\. Kleinman, Y\. Chen, Y\. Hu, Y\. Jia, Y\. Qi, Y\. Li, Y\. Zhang, Y\. Zhang, Y\. Adi, Y\. Nam, Yu, Wang, Y\. Zhao, Y\. Hao, Y\. Qian, Y\. Li, Y\. He, Z\. Rait, Z\. DeVito, Z\. Rosnbrick, Z\. Wen, Z\. Yang, Z\. Zhao, and Z\. Ma \(2024\)The llama 3 herd of models\.External Links:2407\.21783,[Link](https://arxiv.org/abs/2407.21783)Cited by:[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px3.p3.9)\.
- E\. J\. Hu, Y\. Shen, P\. Wallis, Z\. Allen\-Zhu, Y\. Li, S\. Wang, L\. Wang, and W\. Chen \(2022\)LoRA: low\-rank adaptation of large language models\.InProceedings of the International Conference on Learning Representations \(ICLR\),Cited by:[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px3.p3.9)\.
- S\. Huang, H\. Li, Y\. Gu, X\. Hu, Q\. Li, and G\. Xu \(2025\)HyperG: hypergraph\-enhanced llms for structured knowledge\.External Links:2502\.18125,[Link](https://arxiv.org/abs/2502.18125)Cited by:[§2\.3](https://arxiv.org/html/2606.13142#S2.SS3.p1.1)\.
- T\. N\. Kipf and M\. Welling \(2017\)Semi\-supervised classification with graph convolutional networks\.InProceedings of the International Conference on Learning Representations \(ICLR\),Cited by:[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px2.p3.1)\.
- B\. Li, H\. Fei, F\. Li, Y\. Wu, J\. Zhang, S\. Wu, J\. Li, Y\. Liu, L\. Liao, T\. Chua, and D\. Ji \(2023\)DiaASQ: a benchmark of conversational aspect\-based sentiment quadruple analysis\.InFindings of the Association for Computational Linguistics: ACL 2023,pp\. 13449–13467\.External Links:[Link](https://aclanthology.org/2023.findings-acl.849/),[Document](https://dx.doi.org/10.18653/v1/2023.findings-acl.849)Cited by:[§2\.2](https://arxiv.org/html/2606.13142#S2.SS2.p1.1)\.
- C\. Lin \(2004\)ROUGE: a package for automatic evaluation of summaries\.InText Summarization Branches Out,pp\. 74–81\.Cited by:[§4\.2](https://arxiv.org/html/2606.13142#S4.SS2.SSS0.Px1.p1.1)\.
- Y\. Liu, D\. Iter, Y\. Xu, S\. Wang, R\. Xu, and C\. Zhu \(2023\)G\-eval: NLG evaluation using GPT\-4 with better human alignment\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),pp\. 2511–2522\.External Links:[Link](https://aclanthology.org/2023.emnlp-main.153/)Cited by:[§4\.2](https://arxiv.org/html/2606.13142#S4.SS2.SSS0.Px2.p1.1),[§4\.6](https://arxiv.org/html/2606.13142#S4.SS6.p1.1)\.
- OpenAI, :, A\. Hurst, A\. Lerer, A\. P\. Goucher, A\. Perelman, A\. Ramesh, A\. Clark, A\. Ostrow, A\. Welihinda, A\. Hayes, A\. Radford, A\. Mądry, A\. Baker\-Whitcomb, A\. Beutel, A\. Borzunov, A\. Carney, A\. Chow, A\. Kirillov, A\. Nichol, A\. Paino, A\. Renzin, A\. T\. Passos, A\. Kirillov, A\. Christakis, A\. Conneau, A\. Kamali, A\. Jabri, A\. Moyer, A\. Tam, A\. Crookes, A\. Tootoochian, A\. Tootoonchian, A\. Kumar, A\. Vallone, A\. Karpathy, A\. Braunstein, A\. Cann, A\. Codispoti, A\. Galu, A\. Kondrich, A\. Tulloch, A\. Mishchenko, A\. Baek, A\. Jiang, A\. Pelisse, A\. Woodford, A\. Gosalia, A\. Dhar, A\. Pantuliano, A\. Nayak, A\. Oliver, B\. Zoph, B\. Ghorbani, B\. Leimberger, B\. Rossen, B\. Sokolowsky, B\. Wang, B\. Zweig, B\. Hoover, B\. Samic, B\. McGrew, B\. Spero, B\. Giertler, B\. Cheng, B\. Lightcap, B\. Walkin, B\. Quinn, B\. Guarraci, B\. Hsu, B\. Kellogg, B\. Eastman, C\. Lugaresi, C\. Wainwright, C\. Bassin, C\. Hudson, C\. Chu, C\. Nelson, C\. Li, C\. J\. Shern, C\. Conger, C\. Barette, C\. Voss, C\. Ding, C\. Lu, C\. Zhang, C\. Beaumont, C\. Hallacy, C\. Koch, C\. Gibson, C\. Kim, C\. Choi, C\. McLeavey, C\. Hesse, C\. Fischer, C\. Winter, C\. Czarnecki, C\. Jarvis, C\. Wei, C\. Koumouzelis, D\. Sherburn, D\. Kappler, D\. Levin, D\. Levy, D\. Carr, D\. Farhi, D\. Mely, D\. Robinson, D\. Sasaki, D\. Jin, D\. Valladares, D\. Tsipras, D\. Li, D\. P\. Nguyen, D\. Findlay, E\. Oiwoh, E\. Wong, E\. Asdar, E\. Proehl, E\. Yang, E\. Antonow, E\. Kramer, E\. Peterson, E\. Sigler, E\. Wallace, E\. Brevdo, E\. Mays, F\. Khorasani, F\. P\. Such, F\. Raso, F\. Zhang, F\. von Lohmann, F\. Sulit, G\. Goh, G\. Oden, G\. Salmon, G\. Starace, G\. Brockman, H\. Salman, H\. Bao, H\. Hu, H\. Wong, H\. Wang, H\. Schmidt, H\. Whitney, H\. Jun, H\. Kirchner, H\. P\. de Oliveira Pinto, H\. Ren, H\. Chang, H\. W\. Chung, I\. Kivlichan, I\. O’Connell, I\. O’Connell, I\. Osband, I\. Silber, I\. Sohl, I\. Okuyucu, I\. Lan, I\. Kostrikov, I\. Sutskever, I\. Kanitscheider, I\. Gulrajani, J\. Coxon, J\. Menick, J\. Pachocki, J\. Aung, J\. Betker, J\. Crooks, J\. Lennon, J\. Kiros, J\. Leike, J\. Park, J\. Kwon, J\. Phang, J\. Teplitz, J\. Wei, J\. Wolfe, J\. Chen, J\. Harris, J\. Varavva, J\. G\. Lee, J\. Shieh, J\. Lin, J\. Yu, J\. Weng, J\. Tang, J\. Yu, J\. Jang, J\. Q\. Candela, J\. Beutler, J\. Landers, J\. Parish, J\. Heidecke, J\. Schulman, J\. Lachman, J\. McKay, J\. Uesato, J\. Ward, J\. W\. Kim, J\. Huizinga, J\. Sitkin, J\. Kraaijeveld, J\. Gross, J\. Kaplan, J\. Snyder, J\. Achiam, J\. Jiao, J\. Lee, J\. Zhuang, J\. Harriman, K\. Fricke, K\. Hayashi, K\. Singhal, K\. Shi, K\. Karthik, K\. Wood, K\. Rimbach, K\. Hsu, K\. Nguyen, K\. Gu\-Lemberg, K\. Button, K\. Liu, K\. Howe, K\. Muthukumar, K\. Luther, L\. Ahmad, L\. Kai, L\. Itow, L\. Workman, L\. Pathak, L\. Chen, L\. Jing, L\. Guy, L\. Fedus, L\. Zhou, L\. Mamitsuka, L\. Weng, L\. McCallum, L\. Held, L\. Ouyang, L\. Feuvrier, L\. Zhang, L\. Kondraciuk, L\. Kaiser, L\. Hewitt, L\. Metz, L\. Doshi, M\. Aflak, M\. Simens, M\. Boyd, M\. Thompson, M\. Dukhan, M\. Chen, M\. Gray, M\. Hudnall, M\. Zhang, M\. Aljubeh, M\. Litwin, M\. Zeng, M\. Johnson, M\. Shetty, M\. Gupta, M\. Shah, M\. Yatbaz, M\. J\. Yang, M\. Zhong, M\. Glaese, M\. Chen, M\. Janner, M\. Lampe, M\. Petrov, M\. Wu, M\. Wang, M\. Fradin, M\. Pokrass, M\. Castro, M\. O\. T\. de Castro, M\. Pavlov, M\. Brundage, M\. Wang, M\. Khan, M\. Murati, M\. Bavarian, M\. Lin, M\. Yesildal, N\. Soto, N\. Gimelshein, N\. Cone, N\. Staudacher, N\. Summers, N\. LaFontaine, N\. Chowdhury, N\. Ryder, N\. Stathas, N\. Turley, N\. Tezak, N\. Felix, N\. Kudige, N\. Keskar, N\. Deutsch, N\. Bundick, N\. Puckett, O\. Nachum, O\. Okelola, O\. Boiko, O\. Murk, O\. Jaffe, O\. Watkins, O\. Godement, O\. Campbell\-Moore, P\. Chao, P\. McMillan, P\. Belov, P\. Su, P\. Bak, P\. Bakkum, P\. Deng, P\. Dolan, P\. Hoeschele, P\. Welinder, P\. Tillet, P\. Pronin, P\. Tillet, P\. Dhariwal, Q\. Yuan, R\. Dias, R\. Lim, R\. Arora, R\. Troll, R\. Lin, R\. G\. Lopes, R\. Puri, R\. Miyara, R\. Leike, R\. Gaubert, R\. Zamani, R\. Wang, R\. Donnelly, R\. Honsby, R\. Smith, R\. Sahai, R\. Ramchandani, R\. Huet, R\. Carmichael, R\. Zellers, R\. Chen, R\. Chen, R\. Nigmatullin, R\. Cheu, S\. Jain, S\. Altman, S\. Schoenholz, S\. Toizer, S\. Miserendino, S\. Agarwal, S\. Culver, S\. Ethersmith, S\. Gray, S\. Grove, S\. Metzger, S\. Hermani, S\. Jain, S\. Zhao, S\. Wu, S\. Jomoto, S\. Wu, Shuaiqi, Xia, S\. Phene, S\. Papay, S\. Narayanan, S\. Coffey, S\. Lee, S\. Hall, S\. Balaji, T\. Broda, T\. Stramer, T\. Xu, T\. Gogineni, T\. Christianson, T\. Sanders, T\. Patwardhan, T\. Cunninghman, T\. Degry, T\. Dimson, T\. Raoux, T\. Shadwell, T\. Zheng, T\. Underwood, T\. Markov, T\. Sherbakov, T\. Rubin, T\. Stasi, T\. Kaftan, T\. Heywood, T\. Peterson, T\. Walters, T\. Eloundou, V\. Qi, V\. Moeller, V\. Monaco, V\. Kuo, V\. Fomenko, W\. Chang, W\. Zheng, W\. Zhou, W\. Manassra, W\. Sheu, W\. Zaremba, Y\. Patil, Y\. Qian, Y\. Kim, Y\. Cheng, Y\. Zhang, Y\. He, Y\. Zhang, Y\. Jin, Y\. Dai, and Y\. Malkov \(2024\)GPT\-4o system card\.External Links:2410\.21276,[Link](https://arxiv.org/abs/2410.21276)Cited by:[§4\.2](https://arxiv.org/html/2606.13142#S4.SS2.SSS0.Px2.p1.1),[§4\.6](https://arxiv.org/html/2606.13142#S4.SS6.p1.1)\.
- K\. Papineni, S\. Roukos, T\. Ward, and W\. Zhu \(2002\)BLEU: a method for automatic evaluation of machine translation\.InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics \(ACL\),pp\. 311–318\.Cited by:[§4\.2](https://arxiv.org/html/2606.13142#S4.SS2.SSS0.Px1.p1.1)\.
- Qwen, :, A\. Yang, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Li, D\. Liu, F\. Huang, H\. Wei, H\. Lin, J\. Yang, J\. Tu, J\. Zhang, J\. Yang, J\. Yang, J\. Zhou, J\. Lin, K\. Dang, K\. Lu, K\. Bao, K\. Yang, L\. Yu, M\. Li, M\. Xue, P\. Zhang, Q\. Zhu, R\. Men, R\. Lin, T\. Li, T\. Tang, T\. Xia, X\. Ren, X\. Ren, Y\. Fan, Y\. Su, Y\. Zhang, Y\. Wan, Y\. Liu, Z\. Cui, Z\. Zhang, and Z\. Qiu \(2025\)Qwen2\.5 technical report\.External Links:2412\.15115,[Link](https://arxiv.org/abs/2412.15115)Cited by:[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px3.p3.9)\.
- A\. Radford, J\. Wu, R\. Child, D\. Luan, D\. Amodei, and I\. Sutskever \(2019\)Language models are unsupervised multitask learners\.External Links:[Link](https://api.semanticscholar.org/CorpusID:160025533)Cited by:[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px3.p2.6)\.
- C\. Raffel, N\. Shazeer, A\. Roberts, K\. Lee, S\. Narang, M\. Matena, Y\. Zhou, W\. Li, and P\. J\. Liu \(2020\)Exploring the limits of transfer learning with a unified text\-to\-text transformer\.Journal of Machine Learning Research21\(140\),pp\. 1–67\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.13142#S3.SS2.SSS2.p2.1)\.
- N\. Reimers and I\. Gurevych \(2019\)Sentence\-BERT: sentence embeddings using Siamese BERT\-networks\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),Cited by:[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px2.p2.1)\.
- R\. Ribeiro, J\. P\. Carvalho, and L\. Coheur \(2023\)PGTask: introducing the task of profile generation from dialogues\.InProceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue \(SIGDIAL\),pp\. 183–189\.External Links:[Link](https://aclanthology.org/2023.sigdial-1.17/)Cited by:[Appendix A](https://arxiv.org/html/2606.13142#A1.p1.1),[§3\.2\.2](https://arxiv.org/html/2606.13142#S3.SS2.SSS2.p1.1)\.
- H\. Song, Y\. Wang, K\. Zhang, W\. Zhang, and T\. Liu \(2021\)BoB: BERT over BERT for training persona\-based dialogue models from limited personalized data\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing \(ACL\-IJCNLP\),pp\. 167–177\.External Links:[Link](https://aclanthology.org/2021.acl-long.14/),[Document](https://dx.doi.org/10.18653/v1/2021.acl-long.14)Cited by:[§2\.1](https://arxiv.org/html/2606.13142#S2.SS1.p1.1)\.
- C\. Tang, H\. Zhang, T\. Loakman, C\. Lin, and F\. Guerin \(2023a\)Enhancing dialogue generation via dynamic graph knowledge aggregation\.External Links:2306\.16195,[Link](https://arxiv.org/abs/2306.16195)Cited by:[§1](https://arxiv.org/html/2606.13142#S1.p1.1),[§2\.3](https://arxiv.org/html/2606.13142#S2.SS3.p1.1)\.
- Y\. Tang, B\. Wang, M\. Fang, D\. Zhao, K\. Huang, R\. He, and Y\. Hou \(2023b\)Enhancing personalized dialogue generation with contrastive latent variables: combining sparse and dense persona\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(ACL\),pp\. 5456–5468\.Cited by:[§2\.1](https://arxiv.org/html/2606.13142#S2.SS1.p1.1)\.
- A\. van den Oord, Y\. Li, and O\. Vinyals \(2018\)Representation learning with contrastive predictive coding\.arXiv preprint arXiv:1807\.03748\.Cited by:[§3\.5\.2](https://arxiv.org/html/2606.13142#S3.SS5.SSS2.p1.9)\.
- C\. Wu, A\. Madotto, Z\. Lin, P\. Xu, and P\. Fung \(2020\)Getting to know you: user attribute extraction from dialogues\.InProceedings of the 12th Language Resources and Evaluation Conference \(LREC\),Cited by:[§2\.2](https://arxiv.org/html/2606.13142#S2.SS2.p1.1)\.
- N\. Yadati, M\. Nimishakavi, P\. Yadav, V\. Nitin, A\. Louis, and P\. Talukdar \(2019\)HyperGCN: a new method of training graph convolutional networks on hypergraphs\.InProceedings of the 33rd International Conference on Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2606.13142#S1.p2.1),[§2\.3](https://arxiv.org/html/2606.13142#S2.SS3.p1.1),[§3\.4](https://arxiv.org/html/2606.13142#S3.SS4.p1.1)\.
- S\. Zhang, E\. Dinan, J\. Urbanek, A\. Szlam, D\. Kiela, and J\. Weston \(2018\)Personalizing dialogue agents: I have a dog, do you have pets too?\.InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics \(ACL\),pp\. 2204–2213\.External Links:[Link](https://aclanthology.org/P18-1205/),[Document](https://dx.doi.org/10.18653/v1/P18-1205)Cited by:[§1](https://arxiv.org/html/2606.13142#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.13142#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2606.13142#S4.SS1.SSS0.Px1.p1.1)\.
- C\. Zheng, H\. Xu, and X\. Sun \(2024\)Hypergraph neural network for emotion recognition in conversations\.ACM Trans\. Asian Low\-Resour\. Lang\. Inf\. Process\.23\(2\)\.External Links:ISSN 2375\-4699,[Link](https://doi.org/10.1145/3638760),[Document](https://dx.doi.org/10.1145/3638760)Cited by:[§2\.3](https://arxiv.org/html/2606.13142#S2.SS3.p1.1)\.

## Appendix AGPT\-4o\-mini Annotation Prompt

We annotate persona\-bearing utterances from PGDatasetRibeiroet al\.\([2023](https://arxiv.org/html/2606.13142#bib.bib2)\)usinggpt\-4o\-minivia the OpenAI Chat Completions API \(temperature 0, single\-turn\)\. Below we reproduce the exact system and user messages sent for each utterance\.

##### System message\.

> You are a persona\-information extractor\. Given a persona\-bearing utterance, extract a quadruple \(Core, Expression, Sentiment, Category\) where:Coreis the central attribute or entity described;Expressionis the speaker’s description or opinion of the Core;Sentimentis the polarity \(positive/neutral/negative\);Categoryis one ofCharacteristic,Routine or Habit,Goal or Plan,Experience,Relationship\. Output*only*the linearized string:\[Core\] c \[Expression\] e \[Sentiment\] s \[Category\] k\. No additional text\.

##### User message \(three few\-shot examples followed by the target\)\.

> Utterance:i like to remodel homes\. Output:\[Core\] remodel homes \[Expression\] like to \[Sentiment\] positive \[Category\] Routine or Habit Utterance:my mother used to be a nurse\. Output:\[Core\] mother \[Expression\] used to be a nurse \[Sentiment\] neutral \[Category\] Relationship Utterance:i have never been outside of the country\. Output:\[Core\] outside of the country \[Expression\] have never been \[Sentiment\] negative \[Category\] Experience Utterance: \{input utterance\} Output:

Post\-processing\.Outputs are filtered by a regex that checks for all four tags \(\[Core\],\[Expression\],\[Sentiment\],\[Category\]\) and valid slot values for the closed\-set fields\. Malformed outputs \(<<2% of calls\) are discarded; the corresponding utterances receive an “Unknown” label and are excluded from supervised extraction training but retained in the dialogue data\.

## Appendix BHuman Verification of Quadruple Annotations

To assess annotation quality, three English\-proficient annotators independently reviewed 200 randomly sampled \(utterance, quadruple\) pairs drawn from the training set\. For each pair, annotators judged whether the predicted quadruple was*correct*\(all four fields accurate\),*partially correct*\(Core/Expression correct but one label field wrong\), or*incorrect*\. Table[3](https://arxiv.org/html/2606.13142#A2.T3)summarizes the results\.

Table 3:Human verification results on 200 sampled quadruple annotations\.The most common error mode is Category confusion betweenCharacteristicandRoutine/Habit\(e\.g\., habitually performed activities that also characterize the speaker\), which accounts for 7 of the 20 partially correct cases\. Sentiment errors are rare \(3 cases\), confined to ironic or underspecified utterances\. The high exact\-match rate \(84\.5%84\.5\\%\) and strong inter\-annotator agreement \(κ=0\.81\\kappa=0\.81\) confirm that the GPT\-4o\-mini annotations are sufficiently reliable for hypergraph construction\.

## Appendix CFull Ablation Metrics

Table[4](https://arxiv.org/html/2606.13142#A3.T4)reports the complete four\-metric breakdown for the GPT\-2 ablation study summarized by BLEU\-1 in Figure[2](https://arxiv.org/html/2606.13142#S4.F2)\.

Table 4:Ablation on GPT\-2 \(greedy,×\\times100\)\. Each variant removes one module from HyPE\-base\. CA\-MeanPool: per\-category S\-BERT offsets without message\-passing\.
HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue

Similar Articles

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions

Dynamic In-Group Persona Generation for Enhancing Human-AI Rapport

High Dimensional, Dynamic Rotary Positional Embedding [P]

HyperPatch: Sequential Knowledge Editing Under n-ary Structural Drift

Hypergraph as Language

Submit Feedback

Similar Articles

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions
Dynamic In-Group Persona Generation for Enhancing Human-AI Rapport
High Dimensional, Dynamic Rotary Positional Embedding [P]
HyperPatch: Sequential Knowledge Editing Under n-ary Structural Drift