Hypergraph as Language

arXiv cs.CL Papers

Summary

This paper proposes Hyper-Align, a framework that serializes hypergraph structures into tokens via HIDT-O and HIP, enabling LLMs to process high-order relationships, and introduces HyperAlign-Bench for evaluation.

arXiv:2605.21858v1 Announce Type: new Abstract: Large language models (LLMs) have recently shown strong potential in modeling relational structures. However, existing approaches remain fundamentally graph-centric: they focus on processing pairwise graph structures into tokens that LLMs can understand. In contrast, many real-world relational patterns do not naturally conform to the pairwise-edge assumption, and are better modeled as high-order associations in hypergraphs. For hypergraph structures, existing methods often fail to preserve the native semantics that multiple objects are jointly connected by the same high-order relation, limiting their ability to exploit complex structures. To address this limitation, we put forth the "Hypergraph as Language" perspective and propose Hyper-Align, a hypergraph-native alignment framework for large language models. Hyper-Align compiles the query-object-centered hypergraph context into hypergraph tokens directly consumable by a base LLM. Specifically, we introduce Hypergraph Incidence Detail Template with Overview (HIDT-O), which serializes high-order association structures into a fixed-shape hybrid template combining local incidence details and overview-level summaries. We then design a Hypergraph Incidence Projector (HIP), which maps native high-order incidence structures into the LLM token space through explicit semantic-structural decoupling and bidirectional message passing between vertices and hyperedges. We further define a concrete Hypergraph-as-Language input protocol, which jointly feeds hypergraph tokens and textual prompts into a frozen base LLM, supporting both vertex-level and hyperedge-level tasks under a unified question-answering paradigm. To systematically evaluate different methods in hypergraph structural modeling, we introduce HyperAlign-Bench. Extensive experiments show that Hyper-Align significantly outperforms existing methods across in-domain and zero-shot evaluations.
Original Article
View Cached Full Text

Cached at: 05/22/26, 08:44 AM

# Hypergraph as Language
Source: [https://arxiv.org/html/2605.21858](https://arxiv.org/html/2605.21858)
Mengqi Lei1,2, Guohuan Xie1,2, Shihui Ying3, Shaoyi Du4, Jun\-Hai Yong1, Siqi Li1,2, and Yue Gao1,2 1\{BNRist, THUIBCS, BLBCI, School of Software\}, Tsinghua University 2Yangtze Delta Region Institute, Tsinghua University 3Shanghai Institute of Applied Mathematics and Mechanics, Shanghai University 4State Key Laboratory of Human\-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University leimq25@mails\.tsinghua\.edu\.cn, stuxiemol@gmail\.com, shying@shu\.edu\.cn dushaoyi@xjtu\.edu\.cn, yongjh@tsinghua\.edu\.cn, lisiqi19971013@gmail\.com kevin\.gaoy@gmail\.com

###### Abstract

Large language models \(LLMs\) have recently demonstrated substantial potential in modeling relational structures\. However, existing approaches remain fundamentally graph\-centric: they primarily focus on processing pairwise graph structures into tokens that LLMs can understand\. In contrast, many real\-world relational patterns do not naturally conform to the pairwise\-edge assumption, and are better modeled as high\-order associations in hypergraphs\. When facing hypergraph structures, existing methods often fail to preserve the native semantics that multiple objects are jointly connected by the same high\-order relation, thereby limiting their ability to effectively exploit complex structures\. To address this limitation, we put forward the*Hypergraph as Language*perspective and propose Hyper\-Align, a hypergraph\-native alignment framework for large language models\. Hyper\-Align compiles the query\-object\-centered hypergraph context into a sequence of hypergraph tokens that can be directly consumed by a base LLM\. Specifically, we first introduce Hypergraph Incidence Detail Template with Overview \(HIDT\-O\), which serializes high\-order association structures into a fixed\-shape hybrid template combining local incidence details and overview\-level summaries\. We then design a Hypergraph Incidence Projector \(HIP\), which maps native high\-order incidence structures into the LLM token space through explicit semantic\-structural decoupling and bidirectional message passing between vertices and hyperedges\. Building on this, we further define a concrete Hypergraph\-as\-Language input protocol, which jointly feeds hypergraph tokens and textual prompts into a frozen base LLM, thereby supporting both vertex\-level and hyperedge\-level tasks under a unified question\-answering paradigm\. Furthermore, to systematically evaluate the capability of different methods in hypergraph structural modeling, we introduce the HyperAlign\-Bench\. Extensive experiments show that our Hyper\-Align significantly outperforms existing methods across in\-domain and zero\-shot evaluations\. The code is available at:[https://github\.com/Mengqi\-Lei/Hypergraph\-as\-Language](https://github.com/Mengqi-Lei/Hypergraph-as-Language)\.

### 1Introduction

Large language models \(LLMs\) have demonstrated strong capabilities in language understanding, knowledge transfer, and unified task modeling, which has further accelerated their extension to structured data domainsBrownet al\.\([2020](https://arxiv.org/html/2605.21858#bib.bib143)\); Weiet al\.\([2021](https://arxiv.org/html/2605.21858#bib.bib144)\); Zhaoet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib145)\); Hadiet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib147)\)\. In recent years, LLM\-based research on graph\-structured data has gradually become an active research topic, mainly following two paradigmsRenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib146)\); Panet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib148)\); Liet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib149)\)\. One line of methods rewrites graph structures into natural\-language or code\-style descriptionsYeet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib150)\); Chaiet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib151)\); Caiet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib152)\); Liet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib153)\), enabling LLMs to directly perform graph tasks in the textual space\. The other line of methods transforms graph structures into sequences of graph tokens that are compatible with the input space of LLMs, using a frozen or nearly frozen LLM as a unified interface for various tasks and even zero\-shot inferenceTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\); Chenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\); Wanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib132)\); Zhuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib133)\)\. Although these methods have significantly advanced the integration of LLMs and graph learning, their modeling assumptions remain fundamentally graph\-centric: the basic structural units are typically pairwise adjacency, node neighborhoods, or token sequences derived from graphs\.

![Refer to caption](https://arxiv.org/html/2605.21858v1/x1.png)Figure 1:Illustration of our method\.However, many high\-order relations in the real world do not naturally conform to the pairwise\-edge assumption of ordinary graphsBerge \([1984](https://arxiv.org/html/2605.21858#bib.bib155)\); Zhouet al\.\([2006](https://arxiv.org/html/2605.21858#bib.bib156)\); Gaoet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib4)\); Battistonet al\.\([2020](https://arxiv.org/html/2605.21858#bib.bib154)\)\. Data such as paper co\-citation, group interactions, and multi\-entity collaboration are better modeled as hypergraphs, where a hyperedge can simultaneously connect an arbitrary number of vertices\. Accordingly, the semantic focus is no longer whether two vertices are connected, but how a set of objects are associated as a whole\. This means that the native structural unit of a hypergraph is not pairwise adjacency, but the high\-order association established between vertices and hyperedgesFenget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib2)\); Yadatiet al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib120)\)\. In this regard, directly applying existing graph\-centric methods requires expanding a hypergraph into multiple pairwise edges \(*e\.g*\., via clique expansion\), which inevitably loses the high\-order association information in the original hypergraphChienet al\.\([2022](https://arxiv.org/html/2605.21858#bib.bib123)\); Fenget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib157)\)\.

Recently, several preliminary efforts have emerged to combine hypergraphs with large language models\. LLMHGChuet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib161)\), HeLLMGuoet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib162)\), and Hyper\-RAGFenget al\.\([2026](https://arxiv.org/html/2605.21858#bib.bib163)\)respectively focus on recommendation systems, multimodal recommendation, and retrieval\-augmented generation \(RAG\) scenarios, while HyperLLMGuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib159)\)focuses on leveraging LLMs to generate hypergraphs from textual data\. LLM4HypergraphFenget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib160)\)built a systematic benchmark for hypergraph understanding, aiming to explore converting hypergraphs into natural language so that large models can understand them\. We call this rather intuitive paradigm “Hypergraph to Language”, but it can lead to information loss\. Despite these efforts, a fundamental question remains largely unexplored:Can we make hypergraphs directly understandable by LLMs as a language\-like structural input, so that an LLM can natively model high\-order associations and uniformly handle hypergraph tasks?We refer to this new line of research as*“hypergraph as language,”*as shown in Fig\.[1](https://arxiv.org/html/2605.21858#S1.F1)

Based on the*hypergraph as language*view, we propose Hyper\-Align, the first hypergraph\-native alignment framework for large language models\. Hyper\-Align compiles the query\-object\-centered high\-order association structure into a sequence of continuous hypergraph tokens that can be directly consumed by a base LLM, and performs inference under a unified question\-answering paradigm\. Specifically, we first propose a hypergraph\-native serialization approach, Hypergraph Incidence Detail Template with Overview \(HIDT\-O\)\. From the vertex\-hyperedge incidence perspective, it serializes high\-order association structures into a fixed\-shape template comprising local incidence details and overview\-level summaries\. Next, at the representation alignment level, we design the Hypergraph Incidence Projector \(HIP\)\. Unlike the shared\-MLP\-style projectors commonly used in existing graph\-LLM methods, our HIP explicitly decouples semantics and structure, distinguishing different roles such as vertices, hyperedges, and overview components\. Moreover, it performs a high\-order bidirectional message passing between vertices and hyperedges within the projector, thereby mapping the native high\-order association structures into the LLM token space\. Building on this, we further define a concrete Hypergraph\-as\-Language input protocol, which uses a three\-part Background\-Details\-Question prompt to jointly feed hypergraph tokens and textual context into a frozen base LLM\. Finally, in hypergraph alignment tuning, we design two auxiliary supervision tasks, namely order bucket reconstruction and relation reconstruction, to optimize the parameters of the HIP jointly with the main task loss\. Notably, Hyper\-Align is not limited to hypergraphs but naturally extends to ordinary graphs, since an ordinary graph can be viewed as a special hypergraph where each hyperedge associates exactly two vertices\. To systematically evaluate the capability of different methods in high\-order association modeling, we further construct a HyperAlign\-Bench, which contains two core tasks, vertex classification and hyperedge classification, and supports in\-domain and zero\-shot evaluations\.

In summary, the contributions of this paper can be summarized as follows\.

- •We introduce the*Hypergraph as Language*perspective and propose Hyper\-Align, the first hypergraph\-native alignment framework for LLMs\.
- •We propose HIDT\-O, which serializes high\-order association structures into a hybrid template of local details and overview\-level summaries from the vertex\-hyperedge incidence perspective\.
- •We propose HIP and define a concrete Hypergraph\-as\-Language input protocol, establishing a unified alignment interface among high\-order incidence structures, textual contexts, and LLMs\. Meanwhile, we design auxiliary supervision in hypergraph alignment tuning to jointly optimize HIP parameters\.
- •We construct HyperAlign\-Bench, a fair and reproducible benchmark for high\-order association modeling\. Extensive experiments show that Hyper\-Align significantly outperforms existing methods in both in\-domain and zero\-shot evaluations, verifying the necessity and effectiveness of the proposed method\.

### 2Related Work

Existing studies on LLMs for structural data mainly follow two routes: graph\-as\-text, which rewrites graphs into natural\-language or code\-style descriptionsYeet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib150)\); Chaiet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib151)\); Caiet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib152)\); Liet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib153)\), and graph\-to\-token, which maps graph structures into continuous tokens compatible with LLM inputsTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\); Chenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\); Heet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib138)\); Konget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib134)\)\. Methods such as GraphGPTTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\), LLaGAChenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\), TEA\-GLMWanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib132)\), and PromptGFMZhuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib133)\)have shown the promise of structure\-language alignment, but they are fundamentally built on ordinary graphs with pairwise edges\. This makes them insufficient for high\-order associations whose semantics lie in the holistic grouping of multiple vertices within the same hyperedge\. In parallel, hypergraph learning methods such as HGNNFenget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib2)\), Hyper\-SAGNNZhanget al\.\([2020](https://arxiv.org/html/2605.21858#bib.bib158)\), and AllSetChienet al\.\([2022](https://arxiv.org/html/2605.21858#bib.bib123)\)directly model hypergraph structures, while recent works such as HyperBERTBazagaet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib126)\)and preliminary hypergraph\-LLM studiesGuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib159)\); Guoet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib162)\); Chuet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib161)\); Fenget al\.\([2026](https://arxiv.org/html/2605.21858#bib.bib163)\)explore textual hypergraphs, prompting, recommendation, retrieval augmentation, or hypergraph extraction\. However, these methods either rely on task\-specific HGNN encoders or focus on specific scenarios, rather than establishing a unified alignment framework between hypergraphs and LLMs\. Our Hyper\-Align fills this gap by making hypergraphs directly consumable by LLMs as language\-like structural inputs, enabling hypergraph\-native modeling for both vertex and hyperedge\-level tasks\.

### 3Hyper\-Align Framework

#### 3\.1Method Overview

##### Problem formulation\.

Given a hypergraph, we denote it as:ℋ=\(𝒱,ℰ,𝒳,𝒵\)\\mathcal\{H\}=\(\\mathcal\{V\},\\mathcal\{E\},\\mathcal\{X\},\\mathcal\{Z\}\), where𝒱\\mathcal\{V\}denotes the set of vertices,ℰ=\{e1,…,eM\}\\mathcal\{E\}=\\\{e\_\{1\},\\ldots,e\_\{M\}\\\}denotes the set of hyperedges, and each hyperedgeei⊆𝒱e\_\{i\}\\subseteq\\mathcal\{V\}is a subset of vertices that may contain an arbitrary number of vertices\. Here,𝒳=\{xv∣v∈𝒱\}\\mathcal\{X\}=\\\{x\_\{v\}\\mid v\\in\\mathcal\{V\}\\\}represents the textual attributes of vertices, while𝒵=\{ze∣e∈ℰ\}\\mathcal\{Z\}=\\\{z\_\{e\}\\mid e\\in\\mathcal\{E\}\\\}represents optional textual attributes or metadata associated with hyperedges\. We define the vertex degree and the hyperedge degree respectively as:d​\(v\)=\|\{e∈ℰ∣v∈e\}\|,r​\(e\)=\|e\|d\(v\)=\|\\\{e\\in\\mathcal\{E\}\\mid v\\in e\\\}\|,\\quad r\(e\)=\|e\|, wherer​\(e\)r\(e\)is also used as the hyperedge order in our order bucket notation in Sec\.[3\.2](https://arxiv.org/html/2605.21858#S3.SS2.SSS0.Px2)\. We further define the incidence matrix as:B∈\{0,1\}\|𝒱\|×\|ℰ\|,B\\in\\\{0,1\\\}^\{\|\\mathcal\{V\}\|\\times\|\\mathcal\{E\}\|\},where each entryBv,eB\_\{v,e\}indicates whether vertexvvis contained in hyperedgeee:Bv,e=1B\_\{v,e\}=1ifv∈ev\\in e, andBv,e=0B\_\{v,e\}=0otherwise\. Hyper\-Align adopts a unified object\-centric interface\. Given a query centerc∈𝒱∪ℰc\\in\\mathcal\{V\}\\cup\\mathcal\{E\}, the objective of the model is to map the high\-order relational context centered atccinto a continuous token sequence that can be directly consumed by an LLM, and then enable a frozen LLM to perform downstream prediction tasks under a unified question\-answering paradigm\. Notably, when all hyperedges satisfyr​\(e\)=2r\(e\)=2, this formulation naturally degenerates to the ordinary graph setting\.

![Refer to caption](https://arxiv.org/html/2605.21858v1/x2.png)Figure 2:Overall framework of the proposed Hyper\-Align\. In the figure, “OBRecon\.” and “RelRecon\.” denote order bucket reconstruction and relation reconstruction, respectively\. Circles and triangles denote vertices and hyperedges, respectively, and gray dashed slots denote padded slots\.
##### Overall framework of Hyper\-Align\.

As shown in Fig\.[2](https://arxiv.org/html/2605.21858#S3.F2), for an arbitrary center objectcc, the overall computational pipeline of Hyper\-Align can be written as:

c→HIDT\-OΠ​\(c\)=\(u1,…,uL\)→HIPϕT​\(c\)∈ℝLℋ×dllm→LLMΘy,c\\xrightarrow\{\\;\\text\{HIDT\-O\}\\;\}\\Pi\(c\)=\(u\_\{1\},\\ldots,u\_\{L\}\)\\xrightarrow\{\\;\\mathrm\{HIP\}\_\{\\phi\}\\;\}T\(c\)\\in\\mathbb\{R\}^\{L\_\{\\mathcal\{H\}\}\\times d\_\{\\mathrm\{llm\}\}\}\\xrightarrow\{\\;\\mathrm\{LLM\}\_\{\\Theta\}\\;\}y,\(1\)whereΠ​\(c\)\\Pi\(c\)is a fixed\-length structural sequence generated by the hypergraph\-native serialization template,HIPϕ\\mathrm\{HIP\}\_\{\\phi\}denotes the proposed Hypergraph Incidence Projector \(HIP\),T​\(c\)T\(c\)is a sequence of hypergraph tokens of lengthLℋL\_\{\\mathcal\{H\}\}, with the same dimensionality as the hidden space of the backbone LLM, andLLMΘ\\mathrm\{LLM\}\_\{\\Theta\}denotes a frozen large language model\. The overall framework consists of three mutually coupled components: \(1\)HIDT\-O, which serializes the hypergraph structure into a fixed\-length token sequence; \(2\)HIP, which aligns the semantic and structural information in the sequence with the LLM token space; \(3\) theHypergraph\-as\-Language protocol, which feeds the hypergraph tokens together with natural\-language context into the frozen LLM through a three\-part Background\-Details\-Question prompt\. During the entire hypergraph alignment tuning process, only the parameters of HIP are updated\. No task\-specific head is introduced, and the backbone LLM remains unchanged\.

#### 3\.2Hypergraph\-Native Serialization: HIDT\-O

##### Hypergraph Incidence Detail Template\.

To compile high\-order association structures into fixed\-length sequences, we propose the Hypergraph Incidence Detail Template \(HIDT\)\. As shown in Fig\.[3](https://arxiv.org/html/2605.21858#S3.F3), given a center objectcc, HIDT constructs a fixed\-shape incidence tree from the vertex\-hyperedge incidence bipartite perspective, where vertex layers and hyperedge layers strictly alternate\. It is worth noting that this incidence bipartite representation is lossless with respect to the vertex\-hyperedge incidence structure, and can fully retain the structural information of the original hypergraphDai and Gao \([2023](https://arxiv.org/html/2605.21858#bib.bib142)\)\.

![Refer to caption](https://arxiv.org/html/2605.21858v1/x3.png)Figure 3:Illustration of HIDT\. The left part shows a query\-centered local hypergraph context, while the right part shows the corresponding fixed\-shape HIDT incidence tree\. Here,hhindicates the hyperedge\-hop shell,lldenotes the template layer\.Specifically, we place the center objectccat layer 0 as the root node\. Whenc∈𝒱c\\in\\mathcal\{V\}, the first layer samples a fixed number of hyperedges from the incident hyperedges ofcc\. The second layer samples a fixed number of member vertices from each hyperedge in the first layer, excluding the parent vertex to avoid immediate backtracking\. The third layer further samples new incident hyperedges from these member vertices, excluding the parent hyperedge\. Subsequent layers continue to expand alternately between vertices and hyperedges\. Whenc∈ℰc\\in\\mathcal\{E\}, this procedure is dual to the case ofc∈𝒱c\\in\\mathcal\{V\}: we first sample member vertices of the hyperedge, then expand from these members to other incident hyperedges, and so on\. For any parent node, when the number of expandable objects is smaller than the sampling budget, we use\[V\-PAD\]for vertices and\[E\-PAD\]for hyperedges to pad the corresponding slots to the fixed size\. In this way, each hypergraph is organized into an alternating incidence tree with a fixed topology but sample\-dependent contents\. By performing a level\-order traversal over this tree, we obtain the detail sequence:ΠHIDT​\(c\)=\(u1,…,uLD\)\\Pi\_\{\\mathrm\{HIDT\}\}\(c\)=\(u\_\{1\},\\ldots,u\_\{L\_\{D\}\}\), where each position corresponds to a deterministic structural role\. Since the template topology is shared across all samples once the sampling budgets are specified, we can precompute a set of template\-level Laplacian positional encodings for the template, such that identical relative structural roles share consistent positional semantics across different samples\.

##### Order\-aware structural overview suffix\.

In order to cover a broader high\-order receptive field, we append an order\-aware structural overview suffixΠO​\(c\)\\Pi\_\{\\mathrm\{O\}\}\(c\)after HIDT, and refer to the resulting sequence as HIDT\-O:Π​\(c\)=ΠHIDT​\(c\)∥ΠO​\(c\)\\Pi\(c\)=\\Pi\_\{\\mathrm\{HIDT\}\}\(c\)\\,\\\|\\,\\Pi\_\{\\mathrm\{O\}\}\(c\)\.

The overview suffix no longer directly enumerates additional concrete members, since doing so would lead to a combinatorial explosion in sequence length\. Instead, it provides a compressed summary for the center object, characterizing how hyperedges with different hop distances and different hyperedge degrees are distributed around it\. Specifically, starting fromcc, we perform a restricted breadth first search \(BFS\) on the incidence bipartite graph and collect the set of hyperedgesSh​\(c\)S\_\{h\}\(c\)that lie strictly in thehh\-th hyperedge layer\. Here,hhindexes hyperedge layers in the alternating BFS traversal, rather than raw bipartite\-edge distance; the center object is treated as the starting layer and only the additionally reached hyperedges are summarized in the overview suffix\. We then partition them according to order buckets:

Sh,b​\(c\)=\{e∈Sh​\(c\)∣β​\(r​\(e\)\)=b\},S\_\{h,b\}\(c\)=\\\{e\\in S\_\{h\}\(c\)\\mid\\beta\(r\(e\)\)=b\\\},\(2\)whereβ​\(⋅\)\\beta\(\\cdot\)denotes the order bucket mapping function\. For semantic construction, we adopt a parameter\-free alternating aggregation scheme\. Let the initial states of vertices and hyperedges bemv\(0\)=ψ​\(v\)m\_\{v\}^\{\(0\)\}=\\psi\(v\)andme\(0\)=ψ​\(e\)m\_\{e\}^\{\(0\)\}=\\psi\(e\), respectively, whereψ​\(⋅\)\\psi\(\\cdot\)denotes the textual embedding of a vertex or a hyperedge\. We then perform the following alternating propagation on the incidence bipartite graph fort=1,…,Ht=1,\\ldots,H, wherettdenotes the propagation step:

me\(t\)=1\|e\|​∑v∈emv\(t−1\),mv\(t\)=1\|Γ​\(v\)\|​∑e∈Γ​\(v\)\(me\(t\)\+𝐨β​\(r​\(e\)\)\),m\_\{e\}^\{\(t\)\}=\\frac\{1\}\{\|e\|\}\\sum\_\{v\\in e\}m\_\{v\}^\{\(t\-1\)\},\\qquad m\_\{v\}^\{\(t\)\}=\\frac\{1\}\{\|\\Gamma\(v\)\|\}\\sum\_\{e\\in\\Gamma\(v\)\}\\Big\(m\_\{e\}^\{\(t\)\}\+\\mathbf\{o\}\_\{\\beta\(r\(e\)\)\}\\Big\),\(3\)whereΓ​\(v\)=\{e∈ℰ∣v∈e\}\\Gamma\(v\)=\\\{e\\in\\mathcal\{E\}\\mid v\\in e\\\}denotes the set of hyperedges incident to vertexvv\. Here,𝐨β​\(r​\(e\)\)\\mathbf\{o\}\_\{\\beta\(r\(e\)\)\}is a fixed vector assigned to the corresponding order bucket, which is used to explicitly preserve the information of which hyperedge degree interval each hyperedge belongs to during propagation\.

It is worth noting that, for each overview slot\(h,b\)\(h,b\), the semantic aggregation depth is tied to its structural hop index\. Rather than first propagating to a fixed maximum depth and then using the same final representations for all overview slots, we construct the slot at thehh\-th hop by using the hyperedge states at propagation stept=ht=h:

m^h,b​\(c\)=Mean​\(\{me\(h\)∣e∈Sh,b​\(c\)\}\)\.\\hat\{m\}\_\{h,b\}\(c\)=\\mathrm\{Mean\}\\big\(\\\{m\_\{e\}^\{\(h\)\}\\mid e\\in S\_\{h,b\}\(c\)\\\}\\big\)\.\(4\)This design keeps the hop indexhhconsistent across three aspects: the structural layer summarized bySh,b​\(c\)S\_\{h,b\}\(c\), the semantic aggregation depth ofme\(h\)m\_\{e\}^\{\(h\)\}, and the positional encoding of the overview token\.

Overall, HIDT preserves fine\-grained local details of the hypergraph, while the overview suffix further supplements high\-order structural distributions and cross\-hop context across different hop distances and order buckets, without introducing additional parameters\.

##### Hypergraph token encapsulation\.

In HIDT\-O, for all vertices or hyperedges involved in sampling, we use off\-the\-shelf text encoders, such as the Qwen embedding model and SBERT, to encode their textual information\. After obtaining HIDT\-O, we represent each tokenuiu\_\{i\}as the concatenation of a semantic vector and a structural vector\.

On the semantic side, ifuiu\_\{i\}is a vertex token, we use the embedding of its textual attribute\. Ifuiu\_\{i\}is a hyperedge token, we preferentially use the textual embedding of the hyperedge, and fall back to the average of its member vertex representations when no explicit hyperedge text is available\. Ifuiu\_\{i\}is an overview token, we use the corresponding overview vectorm^h,b\\hat\{m\}\_\{h,b\}\. For pad tokens, we use a zero vector\. We denote this semantic representation asaia\_\{i\}\. On the structural side, we explicitly maintain a set of structural descriptors for each token:si=\[Ui​‖𝐭τi‖​𝐡ℓi​‖𝐨bi‖​𝐝δi\]s\_\{i\}=\\big\[\\,U\_\{i\}\\;\\\|\\;\\mathbf\{t\}\_\{\\tau\_\{i\}\}\\;\\\|\\;\\mathbf\{h\}\_\{\\ell\_\{i\}\}\\;\\\|\\;\\mathbf\{o\}\_\{b\_\{i\}\}\\;\\\|\\;\\mathbf\{d\}\_\{\\delta\_\{i\}\}\\,\\big\], whereUiU\_\{i\}denotes the positional encoding,𝐭τi\\mathbf\{t\}\_\{\\tau\_\{i\}\}denotes the token type encoding,𝐡ℓi\\mathbf\{h\}\_\{\\ell\_\{i\}\}denotes the depth encoding,𝐨bi\\mathbf\{o\}\_\{b\_\{i\}\}denotes the order bucket encoding over hyperedge degrees, and𝐝δi\\mathbf\{d\}\_\{\\delta\_\{i\}\}denotes the vertex degree bucket encoding\. Here,δi\\delta\_\{i\}denotes the vertex\-degree bucket index of theii\-th token\. For token types where a structural descriptor is not applicable, we use a special null bucket\. Finally, the input to the projector is written as:gi=\[ai∥si\]g\_\{i\}=\[a\_\{i\}\\,\\\|\\,s\_\{i\}\]\. Through this process, we explicitly provide the structural information that is critical for high\-order relations to the projector, rather than leaving it entirely to language modeling for implicit recovery\.

#### 3\.3Hypergraph Incidence Projector

##### Semantic\-structural decoupling\.

The Hypergraph Incidence Projector \(HIP\) first maps the semantic vector and the structural vector separately\. Given the semantic vectoraia\_\{i\}and structural vectorsis\_\{i\}of theii\-th token, it compresses the semantic side into a semantic core:ci=Wsem⋅LN​\(ai\)c\_\{i\}=W\_\{\\mathrm\{sem\}\}\\cdot\\mathrm\{LN\}\(a\_\{i\}\), whereLN​\(⋅\)\\mathrm\{LN\}\(\\cdot\)denotes Layer Normalization\. Then, the structural side is mapped into the structural patch space through a role\-conditioned structural stem:pi=Wstrρi⋅LN​\(si\)p\_\{i\}=W\_\{\\mathrm\{str\}\}^\{\\rho\_\{i\}\}\\cdot\\mathrm\{LN\}\(s\_\{i\}\), whereρi∈\{V,E,O,P\}\\rho\_\{i\}\\in\\\{\\mathrm\{V\},\\mathrm\{E\},\\mathrm\{O\},\\mathrm\{P\}\\\}denotes the four roles of vertex, hyperedge, overview, and pad, respectively\. The two parts are then concatenated and normalized to obtain the initial hidden state of the projector:hi\(0\)=LN​\(\[ci∥pi\]\)h\_\{i\}^\{\(0\)\}=\\mathrm\{LN\}\\big\(\[c\_\{i\}\\\|p\_\{i\}\]\\big\)\. In addition, this step decouples the original text embedding dimension from the working width of the projector\. The dimension ofhi\(0\)h\_\{i\}^\{\(0\)\}is usually much smaller than the original text embedding dimension, which avoids linearly increasing the projector size as the text encoder changes\.

##### Incidence\-driven bidirectional vertex\-hyperedge message passing\.

If the projector only performs independent linear mapping for each token, the high\-order structure in the hypergraph is still not truly used\. Therefore, HIP introduces a lightweight Hyper\-Incidence Block on the hidden states, explicitly performing vertex\-hyperedge bidirectional message passing driven by incidences inside the HIP\.

Let the initial projector states beH\(0\)=\{hi\(0\)\}i=1LℋH^\{\(0\)\}=\\\{h\_\{i\}^\{\(0\)\}\\\}\_\{i=1\}^\{L\_\{\\mathcal\{H\}\}\}\. For any detail hyperedge tokenee, letM​\(e\)M\(e\)denote the set of member vertex tokens corresponding to it in the current local sequence\. For any detail vertex tokenvv, letN​\(v\)N\(v\)denote the set of hyperedge tokens associated with it in the current local sequence\. First, in the vertex→\\rightarrowhyperedge direction, each hyperedge token aggregates messages from its member vertex tokens\. We use set attention to model the importance differences among the members inside a hyperedge\. For any hyperedge tokenee, the attention weights are defined as:

αe←v=softmaxv∈M​\(e\)​\(\(Wq​he\(0\)\)⊤​\(Wk​hv\(0\)\)datt\),\\alpha\_\{e\\leftarrow v\}=\\mathrm\{softmax\}\_\{v\\in M\(e\)\}\\left\(\\frac\{\(W\_\{q\}h\_\{e\}^\{\(0\)\}\)^\{\\top\}\(W\_\{k\}h\_\{v\}^\{\(0\)\}\)\}\{\\sqrt\{d\_\{\\mathrm\{att\}\}\}\}\\right\),\(5\)wheredattd\_\{\\mathrm\{att\}\}denotes the attention dimension\. The aggregated message from vertices to the hyperedge is obtained as:

meV→E=∑v∈M​\(e\)αe←v​We←v​hv\(0\)\.\\vskip\-2\.0ptm\_\{e\}^\{V\\rightarrow E\}=\\sum\_\{v\\in M\(e\)\}\\alpha\_\{e\\leftarrow v\}\\,W\_\{e\\leftarrow v\}h\_\{v\}^\{\(0\)\}\.\\vskip\-2\.0pt\(6\)Then, we fuse this message with the current hyperedge state and update the hyperedge representation:

h~e=LN​\(he\(0\)\+ϕE​\(\[he\(0\)∥meV→E\]\)\)\.\\tilde\{h\}\_\{e\}=\\mathrm\{LN\}\\Big\(h\_\{e\}^\{\(0\)\}\+\\phi\_\{E\}\\big\(\[\\,h\_\{e\}^\{\(0\)\}\\\|m\_\{e\}^\{V\\rightarrow E\}\\,\]\\big\)\\Big\)\.\(7\)After completing the hyperedge update, we perform reverse aggregation in the hyperedge→\\rightarrowvertex direction symmetrically\. For any vertex tokenvv, its associated hyperedge set isN​\(v\)N\(v\), and the corresponding attention weights are:

αv←e=softmaxe∈N​\(v\)​\(\(Wq​hv\(0\)\)⊤​\(Wk​h~e\)datt\)\.\\vskip\-4\.0pt\\alpha\_\{v\\leftarrow e\}=\\mathrm\{softmax\}\_\{e\\in N\(v\)\}\\left\(\\frac\{\(W\_\{q\}h\_\{v\}^\{\(0\)\}\)^\{\\top\}\(W\_\{k\}\\tilde\{h\}\_\{e\}\)\}\{\\sqrt\{d\_\{\\mathrm\{att\}\}\}\}\\right\)\.\(8\)The aggregated message from hyperedges to the vertex is written as:

mvE→V=∑e∈N​\(v\)αv←e​Wv←e​h~e,\\vskip 4\.0ptm\_\{v\}^\{E\\rightarrow V\}=\\sum\_\{e\\in N\(v\)\}\\alpha\_\{v\\leftarrow e\}\\,W\_\{v\\leftarrow e\}\\tilde\{h\}\_\{e\},\(9\)and the vertex representation is further updated as:

hv\(1\)=LN​\(hv\(0\)\+ϕV​\(\[hv\(0\)∥mvE→V\]\)\)\.h\_\{v\}^\{\(1\)\}=\\mathrm\{LN\}\\Big\(h\_\{v\}^\{\(0\)\}\+\\phi\_\{V\}\\big\(\[\\,h\_\{v\}^\{\(0\)\}\\\|m\_\{v\}^\{E\\rightarrow V\}\\,\]\\big\)\\Big\)\.\(10\)For hyperedge tokens, the final state after this single Hyper\-Incidence Block can be written ashe\(1\)=h~eh\_\{e\}^\{\(1\)\}=\\tilde\{h\}\_\{e\}\. For tokens that do not participate in the incidence update, such as overview and pad tokens, their states are directly carried over ashi\(1\)=hi\(0\)h\_\{i\}^\{\(1\)\}=h\_\{i\}^\{\(0\)\}\. After the Hyper\-Incidence Block, HIP uses a two\-layer MLP to map the final hidden states into the word embedding space of the base LLM, thereby obtaining the complete hypergraph token sequenceT​\(c\)=\{ti\}i=1LℋT\(c\)=\\\{t\_\{i\}\\\}\_\{i=1\}^\{L\_\{\\mathcal\{H\}\}\}\.

#### 3\.4Hypergraph\-as\-Language Protocol and Training

Through HIDT\-O and HIP, we obtain a structure\-aware hypergraph token sequenceT​\(c\)∈ℝLℋ×dllmT\(c\)\\in\\mathbb\{R\}^\{L\_\{\\mathcal\{H\}\}\\times d\_\{\\mathrm\{llm\}\}\}\. At the paper level,*Hypergraph as Language*denotes the overall perspective of making hypergraphs directly consumable by LLMs\. In this section, we define a Hypergraph\-as\-Language protocol to refer to its concrete input interface:T​\(c\)T\(c\)is jointly injected into the input sequence of the LLM together with a natural language prompt designed around it, enabling the LLM to complete downstream prediction under a unified question\-answering paradigm\.

Specifically, for each sample, Hyper\-Align uniformly constructs the following three\-part prompt:

Prompt=Background​‖Details‖​Question\.\\text\{Prompt\}=\\text\{Background\}\\;\\\|\\;\\text\{Details\}\\;\\\|\\;\\text\{Question\}\.\(11\)The Background provides the task statement and domain description, and embeds the special placeholder<hypergraph\>inside the task statement sentence, which is replaced by the hypergraph token sequenceT​\(c\)T\(c\)when fed into the LLM\. The Details section is a textual context deterministically rendered from the same HIDT\-O structure, providing auxiliary natural language supplements for the hypergraph tokens\. The Question section specifies the object to be predicted, the candidate label set, and the output format requirements\.

In the hypergraph alignment tuning, each sample is converted into a dialogue pair, where the human side contains𝒫​\(c\)\\mathcal\{P\}\(c\)with inserted hypergraph tokens and the assistant side contains the target answer text\. Let the resulting input beℐ​\(c\)\\mathcal\{I\}\(c\)and the target answer bey1:Ky\_\{1:K\}\. Hyper\-Align is optimized with the standard causal language modeling loss:ℒlm=−∑k=1Klog⁡pΘ,ϕ​\(yk∣y<k,ℐ​\(c\)\)\\mathcal\{L\}\_\{\\mathrm\{lm\}\}=\-\\sum\_\{k=1\}^\{K\}\\log p\_\{\\Theta,\\phi\}\\big\(y\_\{k\}\\mid y\_\{<k\},\\mathcal\{I\}\(c\)\\big\), whereΘ\\Thetadenotes the frozen LLM parameters andϕ\\phidenotes the trainable HIP parameters\. During training, the prompt tokens and the inserted hypergraph token region are masked from supervision, and only the response is used for next\-token prediction\. Therefore, Hyper\-Align performs projector\-only tuning: the LLM is kept fixed, while HIP learns to align the hypergraph structure with the LLM token space\.

In addition to the main language modeling objective, we design two lightweight high\-order consistency losses on HIP representations\. The first is order bucket reconstruction lossℒord\\mathcal\{L\}\_\{\\mathrm\{ord\}\}, which encourages hyperedge\-related tokens to preserve the hyperedge\-degree bucket information encoded in HIDT\-O\. The second is local relation reconstruction lossℒrel\\mathcal\{L\}\_\{\\mathrm\{rel\}\}, which requires HIP to distinguish basic structural relations between token pairs, such as vertex\-hyperedge incidence and co\-membership within the same hyperedge\. These auxiliary objectives act only on HIP and are used to regularize hypergraph alignment tuning, with detailed formulations provided in Appendix[C](https://arxiv.org/html/2605.21858#A3)\. The overall training objective is written as:ℒ=ℒlm\+λord​ℒord\+λrel​ℒrel\\mathcal\{L\}=\\mathcal\{L\}\_\{\\mathrm\{lm\}\}\+\\lambda\_\{\\mathrm\{ord\}\}\\mathcal\{L\}\_\{\\mathrm\{ord\}\}\+\\lambda\_\{\\mathrm\{rel\}\}\\mathcal\{L\}\_\{\\mathrm\{rel\}\}\.

### 4HyperAlign\-Bench

To systematically evaluate the ability of different models to capture high\-order association structures, we construct HyperAlign\-Bench, the first benchmark for hypergraph\-language alignment\. Unlike existing evaluations that mainly focus on ordinary graphs, HyperAlign\-Bench directly treats hypergraphs as the basic data object, preserving the high\-order association semantics expressed by vertex\-hyperedge incidence\. It also provides a unified data construction pipeline, task protocol, and evaluation interface for fair comparison across different methods\.

HyperAlign\-Bench contains two dual tasks: vertex classification and hyperedge classification\. The former takes a queried vertex as the center and requires the model to predict its category based on the high\-order association context\. The latter takes a queried hyperedge as the center and requires the model to predict the category of the source object that induces the hyperedge\. In HyperAlign\-Bench, the main dataset is Arxiv\-HG, which is built upon OGBN\-ArxivHuet al\.\([2020](https://arxiv.org/html/2605.21858#bib.bib166)\)\. We convert paper citation relations into a co\-citation hypergraph: for each source paper, the set of papers it cites is regarded as a hyperedge\. This construction avoids flattening high\-order co\-citation relations into ordinary graph edges, resulting in a training and in\-domain evaluation dataset with 169,343 vertices and 123,826 hyperedges\. In addition to the main dataset, HyperAlign\-Bench includes 4 reorganized datasets derived from HyperBERTBazagaet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib126)\), namely Cora\-CC, PubMed, DBLP, and IMDB, covering different domains such as papers, author relations, and movies\. These datasets are used to evaluate the model’s ability to transfer high\-order association structure modeling to unseen domains\. Overall, HyperAlign\-Bench provides an experimental foundation for validating hypergraph\-native alignment capability and cross\-domain generalization\. More details are provided in Appendix[B](https://arxiv.org/html/2605.21858#A2)\.

### 5Experimental Results

#### 5\.1Experimental Setup

We evaluate Hyper\-Align on HyperAlign\-Bench\. During training, we jointly optimize two tasks on Arxiv\-HG: vertex classification \(VC\) and hyperedge classification \(HEC\)\. We further evaluate the same checkpoint on four unseen hypergraph datasets, Cora\-CC, PubMed, DBLP, and IMDB, without any additional fine\-tuning, to examine its cross\-domain zero\-shot generalization ability\. By default, Hyper\-Align uses Qwen3\-8B as the base LLM, while vertex and hyperedge semantic features are pre\-encoded by Qwen3\-Embedding\-0\.6B\. We train the model for 2 epochs on 4 NVIDIA A100 GPUs\. The global effective batch size is 64, and the learning rate is set to2×10−32\\times 10^\{\-3\}with a cosine schedule and a warmup ratio of 0\.03\. For the HIDT\-O, we use at most 160 hypergraph tokens, sample up to 8 incident hyperedges for each center vertex, and sample up to 8 member vertices for each hyperedge\. The overview suffix contains 8 tokens, corresponding to 2 hops and 4 order buckets\. More implementation details are provided in Appendix[D\.1](https://arxiv.org/html/2605.21858#A4.SS1)\.

#### 5\.2Comparison with Other Methods

Table 1:In\-domain performance on VC and HEC tasks\. Here, “GOFA∗” indicates that the evaluation is conducted using the officially released weights directly\.CategoryMethodVenueVC \(%\)HEC \(%\)HGNNsHGNNFenget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib2)\)AAAI’1969\.569\.4HyperGCNYadatiet al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib120)\)NeurIPS’1971\.671\.6HANWanget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib83)\)WWW’1965\.366\.5AllSetTransChienet al\.\([2022](https://arxiv.org/html/2605.21858#bib.bib123)\)ICLR’2268\.970\.0PLM\-basedHyperBERTBazagaet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib126)\)EMNLP’2459\.158\.5General LLMsLlama2\-7BTouvronet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib127)\)arXiv’239\.78\.1Llama3\-8BGrattafioriet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib128)\)arXiv’2454\.556\.0Qwen3\-8BYanget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib129)\)arXiv’2555\.560\.3GPT\-5\-miniOpenAI\([2025](https://arxiv.org/html/2605.21858#bib.bib136)\)OpenAI’2565\.467\.0Graph\-LLMsGraphGPTTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\)SIGIR’2469\.369\.7LLaGAChenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\)ICML’2470\.471\.5TEA\-GLMWanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib132)\)NeurIPS’2470\.871\.3G\.PrompterLvet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib137)\)ICDE’2560\.168\.6PromptGFMZhuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib133)\)arXiv’2568\.969\.7UniGraphHeet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib138)\)SIGKDD’2562\.471\.2GOFA∗Konget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib134)\)ICLR’2551\.153\.1GOFAKonget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib134)\)ICLR’2569\.770\.5HG\-LLMsHyper\-Align \(Ours\)–76\.978\.2

##### In\-domain evaluation\.

Table[1](https://arxiv.org/html/2605.21858#S5.T1)reports the in\-domain results on Arxiv\-HG\. Hyper\-Align achieves the best performance on both VC and HEC, significantly outperforming the strongest baseline\. Among existing hypergraph\-specific methods, HGNNs achieve competitive results by directly modeling hypergraph structures, but they remain supervised encoders tailored to specific tasks, offering no interface for alignment with language nor supporting any zero\-shot capabilities\. The pre\-trained language model \(PLM\)\-based method HyperBERT performs much worse, showing that simply injecting textual semantics into PLMs is insufficient for hypergraph modeling\. General LLMs improve when stronger instruction\-following models are used, but their performance is still limited because textual prompts alone cannot faithfully represent native vertex\-hyperedge incidence structures\. Graph\-LLMs further benefit from structure\-aware adaptation, yet they are fundamentally built on pairwise graph representations and therefore cannot fully preserve hyperedge\-level grouping semantics\. In contrast, our Hyper\-Align is the first hypergraph\-native LLM framework that directly aligns vertex\-hyperedge incidence structures with the LLM token space\. The substantial improvement over all HGNNs, PLM\-based methods, general LLMs, and graph\-LLMs demonstrates both the necessity of native hypergraph\-language alignment and the effectiveness of our proposed design\.

Table 2:Zero\-shot performance on Cora\-CC, PubMed, DBLP and IMDB datasets\.MethodVenueCora\-CC \(%\)PubMed \(%\)DBLP \(%\)IMDB \(%\)Average \(%\)VCHECVCHECVCHECVCHECVCHECGraphGPTTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\)SIGIR’2460\.362\.371\.272\.850\.059\.830\.329\.653\.056\.1LLaGAChenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\)ICML’242\.83\.60\.90\.551\.156\.418\.128\.118\.222\.2TEA\-GLMWanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib132)\)NeurIPS’2432\.651\.212\.526\.358\.260\.552\.332\.738\.942\.7G\.PrompterLvet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib137)\)ICDE’2532\.837\.658\.060\.461\.160\.352\.341\.351\.149\.9PromptGFMZhuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib133)\)arXiv’2558\.455\.170\.875\.956\.360\.928\.927\.053\.654\.7UniGraphHeet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib138)\)SIGKDD’2564\.472\.572\.066\.565\.561\.551\.242\.363\.360\.7GOFA∗Konget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib134)\)ICLR’2561\.460\.870\.271\.14\.42\.163\.628\.649\.940\.7GOFAKonget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib134)\)ICLR’2559\.360\.869\.971\.33\.12\.154\.042\.446\.644\.2Hyper\-Align \(Ours\)–74\.875\.777\.577\.667\.264\.674\.544\.973\.565\.7

##### Cross\-domain zero\-shot evaluation\.

Table[2](https://arxiv.org/html/2605.21858#S5.T2)further reports zero\-shot results on four unseen datasets\. Hyper\-Align obtains the best performance on VC and HEC tasks across all datasets\. Although several graph\-LLMs achieve competitive scores on individual datasets, their performance varies substantially across domains\. In contrast, Hyper\-Align shows more stable cross\-domain generalization, indicating that Hyper\-Align learns a more transferable alignment pattern for high\-order association modeling rather than fitting only the source dataset\.

#### 5\.3Ablation Study

Table 3:Detailed ablation study of the proposed components\.Exp\.HIDT\-OHIPAux\. LossInput ProtocolIn\-domainZero\-shot AvgHIDTOverviewOBRecon\.RelRecon\.HGTokenDetailsVCHECVCHECF0: Full Hyper\-Align✓✓✓✓✓✓✓76\.978\.273\.565\.7S1: w/o HIDT detail✗✓✓✓✓✓✓70\.471\.166\.858\.9S2: w/o Overview✓✗✓✓✓✓✓75\.976\.772\.464\.2P1: MLP projector✓✓✗✓✓✓✓74\.375\.070\.962\.1L1: w/o OBRecon\.✓✓✓✗✓✓✓76\.577\.573\.065\.0L2: w/o RelRecon\.✓✓✓✓✗✓✓76\.477\.672\.965\.1L3: w/o Aux\. losses✓✓✓✗✗✓✓76\.177\.372\.564\.4G1: Text\-only prompt–––––✗✓56\.562\.061\.358\.2G2: HG token only✓✓✓✓✓✓✗75\.276\.470\.862\.7

##### Ablation on the proposed components\.

Table[3](https://arxiv.org/html/2605.21858#S5.T3)presents a detailed ablation study of the proposed components\. Removing the HIDT sequence causes the largest degradation\. This confirms that fine\-grained vertex\-hyperedge incidence details are crucial for modeling native high\-order associations\. Removing the overview suffix also leads to consistent drops on both in\-domain and zero\-shot evaluation, indicating that overview tokens provide complementary broader structural context beyond the local HIDT template\. For the projector, replacing HIP with a plain MLP projector substantially hurts performance, especially under zero\-shot transfer, showing that independent token projection is insufficient for aligning hypergraph structures with the LLM space\. The auxiliary losses also bring consistent gains: removing either auxiliary reconstruction task degrades performance, while removing both leads to a larger drop\. This suggests that the two auxiliary objectives serve as useful regularizers for preserving order\-aware and relation\-aware structural information\. We also ablate the Hypergraph\-as\-Language protocol\. The text\-only prompt variant, which removes<hypergraph\>and relies only on textual Details, performs much worse than full Hyper\-Align\. In contrast, the hypergraph\-token\-only variant remains competitive but still underperforms the full protocol, particularly in zero\-shot evaluation\. These results indicate that the hypergraph tokens carry the main structural information, while textual details further facilitate LLM alignment and cross\-domain generalization\.

##### Effect of base LLM and embedding model\.

Table[4](https://arxiv.org/html/2605.21858#S5.T4)evaluates Hyper\-Align with different choices of base LLMs and embedding models\. This experiment is designed to examine whether the advantage of Hyper\-Align comes merely from using stronger recent LLMs or from the proposed hypergraph\-native architecture itself\. To this end, we include Vicuna\-7BChianget al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib139)\)with SBERTReimers and Gurevych \([2019](https://arxiv.org/html/2605.21858#bib.bib140)\)and SimTeGDuanet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib141)\)embeddings, which follow the commonly used settings in prior graph\-LLM work such as LLaGAChenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\); in particular, Vicuna\-7B with SimTeG corresponds to the default LLaGA setting\.

Table 4:Performance of using different base LLMs and embedding models\.Base LLMEmb\. ModelEmb\. Dim\.In\-domainZero\-shot AvgVCHECVCHECVicuna\-7BSBERT38475\.876\.570\.961\.2Vicuna\-7BSimTeG243276\.077\.172\.061\.9Qwen3\-8BQwen3\-Emb\-0\.6B102476\.978\.273\.565\.7Qwen3\-8BQwen3\-Emb\-4B256077\.278\.874\.266\.5

Under these controlled settings, Hyper\-Align still achieves strong performance and remains substantially better than graph\-LLM baselines reported in Table[1](https://arxiv.org/html/2605.21858#S5.T1), demonstrating that the gain is not simply due to using a more advanced LLM or embedding model\. Meanwhile, replacing SBERT with SimTeG improves the results under the Vicuna\-7B setting, and Qwen3\-based configurationsYanget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib129)\); Zhanget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib135)\)further strengthen both in\-domain and zero\-shot performance\. These results show that Hyper\-Align is compatible with different LLMs and semantic encoders, while its main advantage comes from the hypergraph\-native alignment design rather than being tied to a specific LLM or embedding model\.

Table 5:Comparison of single\-task and joint training\.TrainingIn\-domainZero\-shot AvgVCHECVCHECSingle\-task77\.077\.667\.364\.7Joint76\.978\.273\.565\.7

##### Single\-task & joint training\.

Table[5](https://arxiv.org/html/2605.21858#S5.T5)compares single\-task training with joint training\. In\-domain performance is roughly the same for both training methods, but joint training has a significant advantage in cross\-domain zero\-shot performance\. These results indicate that jointly optimizing vertex\-centered and hyperedge\-centered tasks encourages HIP to learn a more general shared alignment between hypergraphs and language, resulting in better generalization ability\.

![Refer to caption](https://arxiv.org/html/2605.21858v1/x4.png)Figure 4:HEC accuracy stratified by hyperedge degree on Arxiv\-HG\.
##### Effect of hyperedge degree\.

We stratify the Arxiv\-HG HEC test hyperedges by their hyperedge degree and report the models’ accuracy for each degree range\. As shown in Fig\.[4](https://arxiv.org/html/2605.21858#S5.F4), Hyper\-Align consistently achieves the best performance across all degree ranges, indicating that Hyper\-Align remains effective in high\-order regimes where richer group\-level associations are available\. Notably, when the hyperedge degree is 2, Hyper\-Align already outperforms graph\-based baselines\. This result shows that our hypergraph formulation naturally accommodates pairwise relations, and can even model such two\-way associations more effectively than standard graph\-based approaches\. We further compare Hyper\-Align with an internal pairwise variant, Hyper\-Align\-clique\. This variant replaces the native hyperedges with clique\-expanded pairwise edges while keeping the rest of the framework unchanged; this setup is consistent with the experimental setup of graph\-LLMs in this paper\. When the hyperedge degree is only 2, Hyper\-Align and Hyper\-Align\-clique perform similarly\. However, as the hyperedge degree increases, Hyper\-Align shows a clear advantage over Hyper\-Align\-clique\. This trend suggests that pairwise expansion can capture part of the relational signal, but loses the native grouping semantics of hyperedges\. The results therefore support our claim that preserving vertex\-hyperedge incidence structure is crucial for modeling high\-order associations\.

### 6Conclusion

In this paper, we introduce the*Hypergraph as Language*perspective and propose Hyper\-Align, the first hypergraph\-native alignment framework for LLMs\. Different from existing graph\-LLMs that rely on pairwise graph representations, Hyper\-Align directly represents vertex\-hyperedge incidence structures through a Hypergraph\-as\-Language protocol, encodes fine\-grained high\-order contexts with HIDT\-O, and aligns hypergraph tokens with the LLM token space via HIP\. To support systematic evaluation, we construct HyperAlign\-Bench, which covers both vertex\-level and hyperedge\-level tasks under in\-domain and zero\-shot settings\. Extensive experiments show that our Hyper\-Align substantially outperforms current HGNNs, PLM\-based methods, general LLMs, and graph\-LLMs\. We hope Hyper\-Align provides a useful foundation for building language\-aligned models that can understand and reason over complex high\-order associations\.

### References

- \[1\]\(2023\)A survey on hypergraph representation learning\.ACM Comp\. Surv\.56\(1\),pp\. 1–38\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1)\.
- \[2\]F\. Battiston, G\. Cencetti, I\. Iacopini, V\. Latora, M\. Lucas, A\. Patania, J\. Young, and G\. Petri\(2020\)Networks beyond pairwise interactions: structure and dynamics\.Physics Reports874,pp\. 1–92\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p3.1),[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1)\.
- \[3\]A\. Bazaga, P\. Liò, and G\. Micklem\(2024\)HyperBERT: mixing hypergraph\-aware layers with language models for node classification on text\-attributed hypergraphs\.InProc\. Conf\. Empirical Methods in Nat\. Lang\. Process\.,pp\. 9181–9193\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p2.1),[§B\.1](https://arxiv.org/html/2605.21858#A2.SS1.p1.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px2.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[§4](https://arxiv.org/html/2605.21858#S4.p2.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.7.2)\.
- \[4\]C\. Berge\(1984\)Hypergraphs: combinatorics of finite sets\.Vol\.45,Elsevier\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p3.1),[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1)\.
- \[5\]T\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. D\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell,et al\.\(2020\)Language models are few\-shot learners\.Adv\. Neural Inform\. Process\. Syst\.33,pp\. 1877–1901\.Cited by:[§1](https://arxiv.org/html/2605.21858#S1.p1.1)\.
- \[6\]Q\. Cai, Z\. Wang, S\. Diao, J\. Kwok, and Y\. Song\(2024\)CodeGraph: enhancing graph reasoning of LLMs with code\.arXiv preprint arXiv:2408\.13863\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[7\]Z\. Chai, T\. Zhang, L\. Wu, K\. Han, X\. Hu, X\. Huang, and Y\. Yang\(2025\)GraphLLM: boosting graph reasoning ability of large language model\.IEEE Trans\. Big Data\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[8\]R\. Chen, T\. Zhao, A\. Jaiswal, N\. Shah, and Z\. Wang\(2024\)LLaGA: large language and graph assistant\.InProc\. Int\. Conf\. Mach\. Learn\.,pp\. 7809–7823\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p2.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px4.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[§5\.3](https://arxiv.org/html/2605.21858#S5.SS3.SSS0.Px2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.13.1),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.5.1)\.
- \[9\]W\. Chiang, Z\. Li, Z\. Lin, Y\. Sheng, Z\. Wu, H\. Zhang, L\. Zheng, S\. Zhuang, Y\. Zhuang, J\. E\. Gonzalez, I\. Stoica, and E\. P\. Xing\(2023\-03\)Vicuna: an open\-source chatbot impressing GPT\-4 with 90%\* ChatGPT quality\.External Links:[Link](https://lmsys.org/blog/2023-03-30-vicuna/)Cited by:[§5\.3](https://arxiv.org/html/2605.21858#S5.SS3.SSS0.Px2.p1.1)\.
- \[10\]E\. Chien, C\. Pan, J\. Peng, and O\. Milenkovic\(2022\)You are AllSet: a multiset function framework for hypergraph neural networks\.InInt\. Conf\. Learn\. Represent\.,Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p3.1),[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.6.1)\.
- \[11\]Z\. Chu, Y\. Wang, Q\. Cui, L\. Li, W\. Chen, Z\. Qin, and K\. Ren\(2024\)LLM\-guided multi\-view hypergraph learning for human\-centric explainable recommendation\.arXiv preprint arXiv:2401\.08217\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p3.1),[§1](https://arxiv.org/html/2605.21858#S1.p3.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[12\]Q\. Dai and Y\. Gao\(2023\)Mathematical foundations of hypergraph\.InHypergraph Computation,pp\. 19–40\.Cited by:[§3\.2](https://arxiv.org/html/2605.21858#S3.SS2.SSS0.Px1.p1.1)\.
- \[13\]K\. Duan, Q\. Liu, T\. Chua, S\. Yan, W\. T\. Ooi, Q\. Xie, and J\. He\(2023\)SimTeG: a frustratingly simple approach improves textual graph learning\.arXiv preprint arXiv:2308\.02565\.Cited by:[§5\.3](https://arxiv.org/html/2605.21858#S5.SS3.SSS0.Px2.p1.1)\.
- \[14\]Y\. Feng, J\. Han, S\. Ying, and Y\. Gao\(2024\)Hypergraph isomorphism computation\.IEEE Trans\. Pattern Anal\. Mach\. Intell\.46\(5\),pp\. 3880–3896\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p3.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1)\.
- \[15\]Y\. Feng, H\. Hu, S\. Ying, X\. Hou, S\. Liu, M\. Yang, J\. Li, S\. Du, N\. Zheng, H\. Hu,et al\.\(2026\)Hyper\-RAG: combating llm hallucinations using hypergraph\-driven retrieval\-augmented generation\.Nature Communications\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p3.1),[§1](https://arxiv.org/html/2605.21858#S1.p3.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[16\]Y\. Feng, C\. Yang, X\. Hou, S\. Du, S\. Ying, Z\. Wu, and Y\. Gao\(2025\)BEYOND graphs: can large language models comprehend hypergraphs?\.InInt\. Conf\. Learn\. Represent\.,pp\. 3445–3472\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p3.1),[§1](https://arxiv.org/html/2605.21858#S1.p3.1)\.
- \[17\]Y\. Feng, H\. You, Z\. Zhang, R\. Ji, and Y\. Gao\(2019\)Hypergraph neural networks\.InAAAI,pp\. 3558–3565\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.3.2)\.
- \[18\]Y\. Gao, Y\. Feng, S\. Liu, X\. Han, S\. Du, Z\. Wu, and H\. Hu\(2026\)Hypergraph foundation model\.IEEE Trans\. Pattern Anal\. Mach\. Intell\.48\(4\),pp\. 4063–4080\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p2.1)\.
- \[19\]Y\. Gao, S\. Ji, X\. Han, and Q\. Dai\(2024\)Hypergraph computation\.Engineering\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p3.1),[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1)\.
- \[20\]A\. Grattafiori, A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Vaughan,et al\.\(2024\)The Llama 3 herd of models\.arXiv preprint arXiv:2407\.21783\.Cited by:[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px3.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.9.1)\.
- \[21\]B\. Gu, J\. Zeng, X\. Qi, and D\. Li\(2025\)Modeling hypergraph using large language models\.arXiv preprint arXiv:2510\.11728\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p3.1),[§1](https://arxiv.org/html/2605.21858#S1.p3.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[22\]X\. Guo, T\. Zhang, Y\. Wang, C\. Wang, F\. Wang, X\. Wang, X\. Zhang, X\. Liu, and Z\. Cui\(2025\)Multi\-modal hypergraph enhanced llm learning for recommendation\.arXiv preprint arXiv:2504\.10541\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p3.1),[§1](https://arxiv.org/html/2605.21858#S1.p3.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[23\]M\. U\. Hadi, R\. Qureshi, A\. Shah, M\. Irfan, A\. Zafar, M\. B\. Shaikh, N\. Akhtar, J\. Wu, S\. Mirjalili,et al\.\(2023\)A survey on large language models: applications, challenges, limitations, and practical usage\.Authorea Preprints\.Cited by:[§1](https://arxiv.org/html/2605.21858#S1.p1.1)\.
- \[24\]Y\. He, Y\. Sui, X\. He, and B\. Hooi\(2025\)UniGraph: learning a unified cross\-domain foundation model for text\-attributed graphs\.InACM SIGKDD,pp\. 448–459\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p2.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px4.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.17.1),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.9.1)\.
- \[25\]W\. Hu, M\. Fey, M\. Zitnik, Y\. Dong, H\. Ren, B\. Liu, M\. Catasta, and J\. Leskovec\(2020\)Open graph benchmark: datasets for machine learning on graphs\.Adv\. Neural Inform\. Process\. Syst\.33,pp\. 22118–22133\.Cited by:[§4](https://arxiv.org/html/2605.21858#S4.p2.1)\.
- \[26\]S\. Kim, S\. Y\. Lee, Y\. Gao, A\. Antelmi, M\. Polato, and K\. Shin\(2024\)A survey on hypergraph neural networks: an in\-depth and step\-by\-step guide\.InACM SIGKDD,pp\. 6534–6544\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1)\.
- \[27\]L\. Kong, J\. Feng, H\. Liu, C\. Huang, J\. Huang, Y\. Chen, and M\. Zhang\(2025\)GOFA: a generative one\-for\-all model for joint graph language modeling\.InInt\. Conf\. Learn\. Represent\.,Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p2.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px4.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.18.1),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.1.1),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.10.1)\.
- \[28\]F\. Li, X\. Wang, W\. Zhang, Y\. Zhang, and X\. Lin\(2025\)DHG\-Bench: a comprehensive benchmark for deep hypergraph learning\.arXiv preprint arXiv:2508\.12244\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p2.1)\.
- \[29\]X\. Li, W\. Chen, Q\. Chu, H\. Li, Z\. Sun, R\. Li, C\. Qian, Y\. Wei, Z\. Liu, C\. Shi,et al\.\(2024\)Can large language models analyze graphs like professionals? a benchmark, datasets and models\.Adv\. Neural Inform\. Process\. Syst\.37,pp\. 141045–141070\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[30\]Y\. Li, Z\. Li, P\. Wang, J\. Li, X\. Sun, H\. Cheng, and J\. X\. Yu\(2023\)A survey of graph meets large language model: progress and future directions\.arXiv preprint arXiv:2311\.12399\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1)\.
- \[31\]R\. Lv, Z\. Zhang, K\. Zhang, Q\. Liu, W\. Gao, J\. Liu, J\. Yan, L\. Yue, and F\. Yao\(2025\)GraphPrompter: multi\-stage adaptive prompt optimization for graph in\-context learning\.InProc\. IEEE Int\. Conf\. Data Eng\.,pp\. 3917–3930\.Cited by:[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px4.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.15.1),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.7.1)\.
- \[32\]OpenAI\(2025\)GPT\-5 mini\.Note:[https://developers\.openai\.com/api/docs/models/gpt\-5\-mini](https://developers.openai.com/api/docs/models/gpt-5-mini)Model:gpt\-5\-miniCited by:[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px3.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.11.1)\.
- \[33\]J\. Z\. Pan, S\. Razniewski, J\. Kalo, S\. Singhania, J\. Chen, S\. Dietze, H\. Jabeen, J\. Omeliyanenko, W\. Zhang, M\. Lissandrini,et al\.\(2023\)Large language models and knowledge graphs: opportunities and challenges\.arXiv preprint arXiv:2308\.06374\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1)\.
- \[34\]N\. Reimers and I\. Gurevych\(2019\)Sentence\-BERT: sentence embeddings using siamese BERT\-networks\.InProc\. Conf\. Empirical Methods in Nat\. Lang\. Process\.,pp\. 3982–3992\.Cited by:[§5\.3](https://arxiv.org/html/2605.21858#S5.SS3.SSS0.Px2.p1.1)\.
- \[35\]X\. Ren, J\. Tang, D\. Yin, N\. Chawla, and C\. Huang\(2024\)A survey of large language models for graphs\.InACM SIGKDD,pp\. 6616–6626\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1)\.
- \[36\]J\. Tang, Y\. Yang, W\. Wei, L\. Shi, L\. Su, S\. Cheng, D\. Yin, and C\. Huang\(2024\)GraphGPT: graph instruction tuning for large language models\.InProc\. Int\. ACM SIGIR Conf\. Res\. Dev\. Inf\. Retr\.,pp\. 491–500\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p2.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px4.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.12.2),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.4.1)\.
- \[37\]H\. Touvron, L\. Martin, K\. Stone, P\. Albert, A\. Almahairi, Y\. Babaei, N\. Bashlykov, S\. Batra, P\. Bhargava, S\. Bhosale,et al\.\(2023\)Llama 2: open foundation and fine\-tuned chat models\.arXiv preprint arXiv:2307\.09288\.Cited by:[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px3.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.8.2)\.
- \[38\]D\. Wang, Y\. Zuo, F\. Li, and J\. Wu\(2024\)Llms as zero\-shot graph learners: alignment of gnn representations with llm token embeddings\.Advances in neural information processing systems37,pp\. 5950–5973\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p2.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px4.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.14.1),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.6.1)\.
- \[39\]X\. Wang, H\. Ji, C\. Shi, B\. Wang, Y\. Ye, P\. Cui, and P\. S\. Yu\(2019\)Heterogeneous graph attention network\.InInt\. World Wide Web Conf\.,pp\. 2022–2032\.Cited by:[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px1.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.5.1)\.
- \[40\]J\. Wei, M\. Bosma, V\. Y\. Zhao, K\. Guu, A\. W\. Yu, B\. Lester, N\. Du, A\. M\. Dai, and Q\. V\. Le\(2021\)Finetuned language models are zero\-shot learners\.arXiv preprint arXiv:2109\.01652\.Cited by:[§1](https://arxiv.org/html/2605.21858#S1.p1.1)\.
- \[41\]N\. Yadati, M\. Nimishakavi, P\. Yadav, V\. Nitin, A\. Louis, and P\. Talukdar\(2019\)HyperGCN: a new method for training graph convolutional networks on hypergraphs\.Adv\. Neural Inform\. Process\. Syst\.32\.Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.4.1)\.
- \[42\]A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv,et al\.\(2025\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px3.p1.1),[§5\.3](https://arxiv.org/html/2605.21858#S5.SS3.SSS0.Px2.p2.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.10.1)\.
- \[43\]R\. Ye, C\. Zhang, R\. Wang, S\. Xu, and Y\. Zhang\(2024\)Language is all a graph needs\.InFindings Assoc\. Comput\. Linguist\.,pp\. 1955–1973\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[44\]R\. Zhang, Y\. Zou, and J\. Ma\(2020\)Hyper\-SAGNN: a self\-attention based graph neural network for hypergraphs\.InInt\. Conf\. Learn\. Represent\.,Cited by:[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1)\.
- \[45\]Y\. Zhang, M\. Li, D\. Long, X\. Zhang, H\. Lin, B\. Yang, P\. Xie, A\. Yang, D\. Liu, J\. Lin,et al\.\(2025\)Qwen3 embedding: advancing text embedding and reranking through foundation models\.arXiv preprint arXiv:2506\.05176\.Cited by:[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px1.p1.1),[§5\.3](https://arxiv.org/html/2605.21858#S5.SS3.SSS0.Px2.p2.1)\.
- \[46\]W\. X\. Zhao, K\. Zhou, J\. Li, T\. Tang, X\. Wang, Y\. Hou, Y\. Min, B\. Zhang, J\. Zhang, Z\. Dong,et al\.\(2023\)A survey of large language models\.arXiv preprint arXiv:2303\.182231\(2\),pp\. 1–124\.Cited by:[§1](https://arxiv.org/html/2605.21858#S1.p1.1)\.
- \[47\]D\. Zhou, J\. Huang, and B\. Schölkopf\(2006\)Learning with hypergraphs: clustering, classification, and embedding\.Adv\. Neural Inform\. Process\. Syst\.19\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p3.1),[§A\.2](https://arxiv.org/html/2605.21858#A1.SS2.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p2.1)\.
- \[48\]X\. Zhu, H\. Xue, Z\. Zhao, W\. Xu, J\. Huang, M\. Guo, Q\. Wang, K\. Zhou, I\. Razzak, and Y\. Zhang\(2025\)LLM as GNN: graph vocabulary learning for text\-attributed graph foundation models\.arXiv preprint arXiv:2503\.03313\.Cited by:[§A\.1](https://arxiv.org/html/2605.21858#A1.SS1.p2.1),[§D\.2](https://arxiv.org/html/2605.21858#A4.SS2.SSS0.Px4.p1.1),[§1](https://arxiv.org/html/2605.21858#S1.p1.1),[§2](https://arxiv.org/html/2605.21858#S2.p1.1),[Table 1](https://arxiv.org/html/2605.21858#S5.T1.3.1.16.1),[Table 2](https://arxiv.org/html/2605.21858#S5.T2.1.1.8.1)\.

## Appendix

### Appendix ADetailed Related Work

#### A\.1LLMs for Graph Structural Data

Existing studies on how to enable LLMs to understand and process graph\-structured data can be broadly categorized into two paradigmsRenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib146)\); Panet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib148)\); Liet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib149)\)\. The earlier paradigm follows the graph\-as\-text \(or graph\-as\-code\) route, whose core idea is to rewrite graph structures into natural\-language descriptions or code\-style representations, and then enable LLMs to perform graph tasks in the textual space through instruction tuning or in\-context learningYeet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib150)\); Chaiet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib151)\); Caiet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib152)\); Liet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib153)\)\. InstructGLM directly describes multi\-scale graph structures in natural language, promoting early explorations in generative graph learningYeet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib150)\)\. GraphLLM further points out that the conventional Graph2Text process itself may become a bottleneck, and attempts to enhance the graph reasoning capability of LLMs through end\-to\-end structural modelingChaiet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib151)\)\.

A more recent mainstream paradigm is the graph\-to\-token routeTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\); Chenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\); Wanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib132)\); Zhuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib133)\); Heet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib138)\); Konget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib134)\)\. The core question in this direction is not how to write graphs as text, but how to transform graph structures into graph tokens that are compatible with the input space of LLMs\. GraphGPT injects graph structural knowledge into LLMs through graph instruction tuningTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\)\. LLaGA maps node sequences in graphs into the LLM token space through structure\-aware templates and a lightweight projectorChenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\)\. TEA\-GLM emphasizes the explicit alignment between graph neural network representations and LLM token embeddings, aiming to improve zero\-shot generalization across datasets and tasksWanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib132)\)\. PromptGFM further proposes a language\-based graph vocabulary, unifying graph structure understanding and reasoningZhuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib133)\)\.

Although the above graph\-to\-token route has demonstrated that aligning structures to language models is a highly promising technical direction, their shared premise remains the ordinary graph: the basic structural units being modeled are nodes, binary edges, and their local neighborhoods\. For many real\-world relational data, this premise is insufficientBerge \([1984](https://arxiv.org/html/2605.21858#bib.bib155)\); Zhouet al\.\([2006](https://arxiv.org/html/2605.21858#bib.bib156)\); Gaoet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib4)\); Battistonet al\.\([2020](https://arxiv.org/html/2605.21858#bib.bib154)\)\. The semantic core of phenomena such as co\-occurrence, group interactions, and multi\-agent events lies in the holistic fact that a set of objects are jointly associated by the same high\-order relation\. Existing studies have shown that graphizing a hypergraph flattens the group\-level identity information within hyperedges, and is therefore insufficient to preserve the native semantics of high\-order relationsChienet al\.\([2022](https://arxiv.org/html/2605.21858#bib.bib123)\); Fenget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib157)\)\. Therefore, although existing graph\-LLM methods provide an important foundation for structure\-language alignment, they are still insufficient to directly address high\-order association modeling in hypergraph scenarios\.

#### A\.2Hypergraph Learning and Preliminary Explorations of Hypergraph\-LLMs

Unlike the pairwise connections modeled by ordinary graphs, a hyperedge in a hypergraph can simultaneously connect an arbitrary number of vertices, thereby is more suited for capturing high\-order associations that widely exist in real\-world dataBerge \([1984](https://arxiv.org/html/2605.21858#bib.bib155)\); Zhouet al\.\([2006](https://arxiv.org/html/2605.21858#bib.bib156)\); Gaoet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib4)\); Battistonet al\.\([2020](https://arxiv.org/html/2605.21858#bib.bib154)\)\. Centered on this structural property, hypergraph learning has become a key technique for studying associative relationshipsFenget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib2)\); Yadatiet al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib120)\); Antelmiet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib5)\); Kimet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib7)\)\. Early work such as Hypergraph Neural Network \(HGNN\) directly takes hyperedges as the basic structural units for message passing, initiating the study of hypergraph neural networksFenget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib2)\)\. Subsequently, Hyper\-SAGNN further leverages self\-attention mechanisms to handle hyperedges of variable sizes and supports high\-order relation predictionZhanget al\.\([2020](https://arxiv.org/html/2605.21858#bib.bib158)\)\. AllSet unifies hypergraph neural networks from the perspective of multiset functions, emphasizing the core inductive bias that “a hyperedge is a set rather than a list of edges”Chienet al\.\([2022](https://arxiv.org/html/2605.21858#bib.bib123)\)\.

In recent years, researchers have further begun to explore the joint modeling of high\-order structures and textual semantics\. HyperBERT incorporates hypergraph\-aware layers into pretrained language models for vertex classification on text\-attributed hypergraphsBazagaet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib126)\)\. Hyper\-FM investigates the scalability of hypergraph neural networks from the perspective of foundation modelsGaoet al\.\([2026](https://arxiv.org/html/2605.21858#bib.bib164)\)\. DHG\-Bench systematically compares different categories of hypergraph neural network methods across multiple tasks and datasets from the benchmark perspectiveLiet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib165)\)\. Although these works have started to study the modeling of text\-attributed hypergraphs, their core approach remains an HGNN encoder\. In most cases, textual attributes are merely converted by text encoders such as BERT into features that can be processed by HGNNs\.

Meanwhile, in the past two years, several preliminary efforts have also emerged to combine hypergraphs with large language models\. LLM4Hypergraph constructs a systematic benchmark for hypergraph understanding and reasoning, and designs multiple prompting strategies to analyze whether LLMs are capable of handling hypergraphsFenget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib160)\)\. Beyond this, most existing studies still follow scenario\-specific integration\. Methods such as LLMHG and HeLLM mainly target recommendation systemsChuet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib161)\); Guoet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib162)\), Hyper\-RAG focuses on retrieval augmentation over complex relational knowledgeFenget al\.\([2026](https://arxiv.org/html/2605.21858#bib.bib163)\), and HyperLLM aims to extract high\-order association structures from text using large language modelsGuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib159)\)\. These studies indicate that hypergraphs can indeed bring high\-order structural benefits to large language models\. However, their focus is mainly on benchmarks, prompting, or specific applications, rather than on establishing a unified hypergraph\-language alignment framework\. More importantly, existing attempts largely follow what we term the “Hypergraph to Language” paradigm, where hypergraph structure is first translated into natural\-language descriptions and then fed to the LLM\. Although intuitive, this paradigm is inherently prone to structural information loss\. Natural\-language prompts are designed for readability rather than exact structural preservation, and therefore often compress, linearize, or truncate the original hypergraph\. As a result, crucial information such as native vertex\-hyperedge incidence, hyperedge identity, group membership as a whole, and order\-related structural cues may be weakened or lost during the conversion process\. In contrast, our “Hypergraph as Language” view does not describe a hypergraph only through textual narration; instead, it directly organizes native hypergraph structure into structural tokens that can be consumed by the LLM\. Since the incidence representation itself is lossless with respect to the original vertex\-hyperedge relations, this formulation provides a principled path toward structure\-preserving hypergraph\-language alignment\.

In summary, existing hypergraph learning methods mainly study how to achieve better association modeling on hypergraphs, while existing hypergraph\-LLM studies mainly investigate whether LLMs can understand hypergraphs or how to combine hypergraphs with LLMs in specific scenarios\. However, they have not yet answered a more fundamental question: Can we make hypergraphs directly understandable by LLMs as a language\-like structural input, so that an LLM can natively model high\-order associations and uniformly handle hypergraph tasks? The proposed Hyper\-Align is designed to fill this gap: we make the first attempt to construct a hypergraph\-native alignment framework for large language models\.

### Appendix BDetails of HyperAlign\-Bench

HyperAlign\-Bench is designed to evaluate whether a model can understand and generalize over native high\-order associations in hypergraphs\. It contains five formal benchmark datasets: Arxiv\-HG, Cora\-CC, PubMed, DBLP, and IMDB\. Each dataset supports both vertex classification \(VC\) and hyperedge classification \(HEC\), enabling unified evaluation at both vertex and hyperedge levels\. Arxiv\-HG is used as the main in\-domain training and evaluation dataset, while the other four datasets are used for zero\-shot transfer evaluation\.

#### B\.1Dataset Construction and Statistics

Table[6](https://arxiv.org/html/2605.21858#A2.T6)summarizes the basic statistics of HyperAlign\-Bench\. Arxiv\-HG is constructed from OGBN\-Arxiv as co\-citation hypergraph, where each hyperedge captures a high\-order co\-citation relation after excluding the source object from the hyperedge construction\. It is the largest dataset in HyperAlign\-Bench, containing more than 169K vertices, 123K hyperedges, and 1\.1M vertex\-hyperedge incidences\. In addition to Arxiv\-HG, HyperAlign\-Bench includes four reorganized zero\-shot datasets derived from the datasets used in HyperBERTBazagaet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib126)\), namely Cora\-CC, PubMed, DBLP, and IMDB\. These datasets cover different domains and exhibit different structural properties, providing a testbed for evaluating cross\-domain generalization over high\-order association structures\.

Table 6:Statistics of the five formal datasets in HyperAlign\-Bench\. All datasets support both vertex classification \(VC\) and hyperedge classification \(HEC\)\.DatasetDomain / Source\#Vertices\#Hyperedges\#Incidences\#ClassesTasksArxiv\-HGOGBN\-Arxiv co\-citation hypergraph169,343123,8261,116,23140VC, HECCora\-CCCora one\-hop successor hypergraph2,3412,2198,1627VC, HECPubMedPubMed one\-hop successor hypergraph19,71613,01181,5023VC, HECDBLPDBLP\-A one\-hop successor hypergraph2,5912,4634,1996VC, HECIMDBIMDB one\-hop successor hypergraph3,9398394,6563VC, HEC

#### B\.2Task Splits

For VC, the prediction target is a center vertex; for HEC, the prediction target is a center hyperedge\. Table[7](https://arxiv.org/html/2605.21858#A2.T7)reports the train, validation, and test splits for both tasks\. In the main experiments, Hyper\-Align is trained on the training split of Arxiv\-HG and evaluated on its test split for in\-domain performance\. For Cora\-CC, PubMed, DBLP, and IMDB, we use their test splits for zero\-shot evaluation without any additional fine\-tuning\. Their train and validation splits are retained for completeness and future extensions\.

Table 7:Task splits of HyperAlign\-Bench for vertex classification \(VC\) and hyperedge classification \(HEC\)\.DatasetTaskTrainValidTestArxiv\-HGVC90,94129,79948,603HEC56,65123,85143,324Cora\-CCVC1,404468469HEC1,327451441PubMedVC11,8293,9433,944HEC7,8392,5782,594DBLPVC1,554518519HEC1,485489489IMDBVC2,363787789HEC480163196
#### B\.3Degree Distributions

We further visualize the complementary cumulative distribution functions \(CCDFs\) of vertex degree and hyperedge degree in Fig\.[5](https://arxiv.org/html/2605.21858#A2.F5)\. Here, vertex degree denotes the number of incident hyperedges of a vertex, while hyperedge degree denotes the number of vertices contained in a hyperedge\. The distributions show that HyperAlign\-Bench covers diverse high\-order structural patterns across datasets\. Arxiv\-HG has the largest scale and exhibits a long\-tailed vertex\-degree distribution, reflecting the highly skewed connectivity patterns in citation\-derived hypergraphs\. PubMed and Cora\-CC also contain non\-trivial high\-order structures, while DBLP is relatively more concentrated\. IMDB contains fewer hyperedges, but its hyperedge\-degree distribution is highly long\-tailed, indicating the presence of very large hyperedges\. These differences make HyperAlign\-Bench suitable for evaluating whether a model can generalize across heterogeneous hypergraph structures rather than overfitting to a single degree regime\.

![Refer to caption](https://arxiv.org/html/2605.21858v1/x5.png)Figure 5:CCDFs of vertex degree and hyperedge degree on the five formal datasets in HyperAlign\-Bench\. Vertex degree denotes the number of incident hyperedges of a vertex, while hyperedge degree denotes the number of vertices contained in a hyperedge\.

### Appendix CDetails of High\-Order Consistency Auxiliary Supervision

In addition to the main language modeling loss, Hyper\-Align retains two types of auxiliary supervision that act only on HIP representations\.

The first is the order bucket reconstruction loss\. For hyperedge tokens or overview tokens with real hyperedge identity, their hidden states are required to recover the corresponding order bucket:

ℒord=∑iCE​\(gord​\(hi\(1\)\),β​\(ri\)\)\.\\mathcal\{L\}\_\{\\mathrm\{ord\}\}=\\sum\_\{i\}\\mathrm\{CE\}\\\!\\big\(g\_\{\\mathrm\{ord\}\}\(h\_\{i\}^\{\(1\)\}\),\\beta\(r\_\{i\}\)\\big\)\.\(12\)The second is the relation reconstruction loss\. We sample a set of token pairsΩ=\(i,j\)\\Omega=\{\(i,j\)\}from the HIDT detail segment, and predict the local structural relation between them:ri​j∈\{unrelated,incidence,co\-member\}r\_\{ij\}\\in\\\{\\text\{unrelated\},\\text\{incidence\},\\text\{co\-member\}\\\}, where “incidence” indicates that a vertex token and a hyperedge token have a real incidence relation in the original hypergraph, and “co\-member” indicates that two vertex tokens come from the member branches of the same sampled hyperedge\. The corresponding loss is

ℒrel=∑\(i,j\)∈ΩCE​\(grel​\(\[hi\(1\)∥hj\(1\)\]\),ri​j\)\.\\mathcal\{L\}\_\{\\mathrm\{rel\}\}=\\sum\_\{\(i,j\)\\in\\Omega\}\\mathrm\{CE\}\\\!\\big\(g\_\{\\mathrm\{rel\}\}\(\[h\_\{i\}^\{\(1\)\}\\\|h\_\{j\}^\{\(1\)\}\]\),\\,r\_\{ij\}\\big\)\.\(13\)
Thus, the overall training objective is written as

ℒ=ℒlm\+λord​ℒord\+λrel​ℒrel\.\\mathcal\{L\}=\\mathcal\{L\}\_\{\\mathrm\{lm\}\}\+\\lambda\_\{\\mathrm\{ord\}\}\\mathcal\{L\}\_\{\\mathrm\{ord\}\}\+\\lambda\_\{\\mathrm\{rel\}\}\\mathcal\{L\}\_\{\\mathrm\{rel\}\}\.\(14\)

### Appendix DExperimental Setup Details

#### D\.1Implementation Details

We conduct a systematic evaluation of our Hyper\-Align on the HyperAlign\-Bench\. During training, we jointly optimize two tasks on Arxiv\-HG: vertex classification and hyperedge classification\. Both tasks share the same label space of 40 computer science categories from OGBN\-Arxiv\. In addition to in\-domain evaluation, we further conduct zero\-shot evaluation on four unseen hypergraph datasets, namely Cora\-CC, PubMed, DBLP, and IMDB, to examine the model’s ability to generalize high\-order relational modeling across domains\. All zero\-shot results are obtained using the same checkpoint trained on Arxiv\-HG, without any additional fine\-tuning\.

By default, Hyper\-Align adopts Qwen3\-8B as the frozen backbone language model\. The textual semantic features of vertices and hyperedges are pre\-encoded by Qwen3\-Embedding\-0\.6B\. Training is conducted on 4 NVIDIA A100 GPUs with DeepSpeed ZeRO\-2, bf16 precision, and gradient checkpointing\. The model is trained for 2 epochs with a per\-GPU batch size of 8 and a gradient accumulation step of 2, resulting in a global effective batch size of 64\. The learning rate is set to2×10−32\\times 10^\{\-3\}, with a cosine schedule, a warmup ratio of 0\.03, and a maximum sequence length of 4096\. During training, each GPU uses about 31 GB of memory, and the training takes approximately 14 hours\.

Hypergraph inputs are constructed using the HIDT\-O template\. We retain at most 160 hypergraph tokens\. For each center object, we sample up to 8 incident hyperedges, and for each hyperedge, up to 8 member vertices\. The overview suffix uses 2 hops and 4 order buckets, yielding 8 overview tokens\. In the projector, the semantic core dimension is set to 384 and the structure sidecar dimension to 64\. During inference, we use deterministic decoding with a maximum generation length of 32\.

#### D\.2Baselines

We compare Hyper\-Align with four categories of baselines: hypergraph neural networks \(HGNNs\), PLM\-based methods, general LLMs, and graph\-LLMs\. These categories cover the main modeling paradigms relevant to our setting\. HGNNs provide task\-specific hypergraph encoders that directly model native hypergraph structure\. PLM\-based methods examine whether pre\-trained language representations can be effectively integrated with hypergraph modeling\. General LLMs evaluate whether strong instruction\-following models can solve the task in a zero\-shot manner using only natural\-language descriptions\. Graph\-LLMs test whether graph\-language alignment methods developed for ordinary graphs can transfer to hypergraph data after graph adaptation\. For each baseline, we follow its official implementation and adopt the hyperparameter settings reported in the original paper to ensure fair comparison\.

##### HGNNs\.

For HGNN baselines, we consider HGNNFenget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib2)\), HyperGCNYadatiet al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib120)\), HANWanget al\.\([2019](https://arxiv.org/html/2605.21858#bib.bib83)\), and AllSetTransChienet al\.\([2022](https://arxiv.org/html/2605.21858#bib.bib123)\)\. These methods are supervised hypergraph encoders that operate directly on hypergraph structure and therefore serve as strong task\-specific non\-LLM baselines\. To ensure a consistent semantic input space, we use the same text features encoded by Qwen3\-Embedding\-0\.6BZhanget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib135)\)for all HGNNs\. Each vertex feature is obtained by concatenating the title and abstract of the corresponding original object and encoding the resulting text\. Since these HGNN methods do not support direct input of hyperedge features, their hyperedge features are generated through feature aggregation within the hypergraph message\-passing layers\. We train and evaluate each HGNN on the corresponding VC and HEC splits of HyperAlign\-Bench\.

##### PLM\-based methods\.

For the PLM\-based baseline, we use HyperBERTBazagaet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib126)\)\. Unlike pure text\-only methods, HyperBERT combines pre\-trained language representations with hypergraph\-aware structural modeling, and therefore serves as a representative baseline for integrating language semantics with hypergraph learning\. To keep the comparison focused on the modeling framework rather than raw semantic inputs, we use the same text feature construction as above and follow the official implementation under the same benchmark protocol\.

##### General LLMs\.

For general LLM baselines, we include Llama2\-7BTouvronet al\.\([2023](https://arxiv.org/html/2605.21858#bib.bib127)\), Llama3\-8BGrattafioriet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib128)\), Qwen3\-8BYanget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib129)\), and GPT\-5\-miniOpenAI \([2025](https://arxiv.org/html/2605.21858#bib.bib136)\)\. These models do not directly accept structured graph features as input, so we evaluate them only in the zero\-shot setting using the task prompt and textual description alone\. This comparison tests whether a strong general\-purpose LLM can recover hypergraph semantics from natural\-language context without explicit structural alignment\.

##### Graph\-LLMs\.

For graph\-LLM baselines, we consider GraphGPTTanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib130)\), LLaGAChenet al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib131)\), TEA\-GLMWanget al\.\([2024](https://arxiv.org/html/2605.21858#bib.bib132)\), GraphPrompterLvet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib137)\), PromptGFMZhuet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib133)\), UniGraphHeet al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib138)\), and GOFAKonget al\.\([2025](https://arxiv.org/html/2605.21858#bib.bib134)\)\. These methods are originally designed for text\-attributed ordinary graphs, and therefore cannot directly consume the native vertex\-hyperedge incidence structure in HyperAlign\-Bench\. To make them applicable, we convert each hypergraph task into the closest compatible ordinary\-graph task, while keeping the original labels and train/validation/test splits unchanged\.

For VC, we use the clique expansion\. The original hypergraph vertices are retained as graph nodes, and each hyperedge is replaced by pairwise edges among all vertices incident to it\. Thus, an edge in the converted graph indicates that two vertices co\-occur in at least one hyperedge\. For HEC, the prediction target is a hyperedge\. We therefore first construct the dual hypergraph, where each original hyperedge becomes a dual vertex, and each original vertex induces a dual hyperedge connecting all original hyperedges that contain this vertex\. We then apply the same clique expansion to this dual hypergraph\. Equivalently, in the resulting ordinary graph, each node corresponds to an original hyperedge, and two nodes are connected if the corresponding original hyperedges share at least one original vertex\.

After the above adaptation process, the resulting text\-attributed graphs are then fed into each graph\-LLM following its official training and inference setting\. This protocol evaluates whether graph\-language alignment methods developed for pairwise graphs can transfer to hypergraph tasks after standard graph adaptation\.

### Appendix ESupplementary Experiments

Table 8:Model performance of setting different training epochs\.EpochIn\-domainZero\-shot AvgVCHECVCHEC176\.678\.068\.361\.6276\.978\.273\.565\.7377\.078\.463\.158\.8##### Effect of training epochs\.

Table[8](https://arxiv.org/html/2605.21858#A5.T8)studies the effect of training epochs\. Increasing the training length from one epoch to two epochs improves zero\-shot transfer substantially, indicating that sufficient projector tuning is necessary for aligning high\-order structures with the frozen LLM\. A third epoch gives only marginal in\-domain improvement, but clearly hurts zero\-shot performance\. This suggests that excessive tuning may overfit the source Arxiv\-HG distribution and weaken cross\-domain generalization\. Therefore, we use two training epochs as the default setting\.

![Refer to caption](https://arxiv.org/html/2605.21858v1/x6.png)\(a\)HEC performance stratified by hyperedge degree\.
![Refer to caption](https://arxiv.org/html/2605.21858v1/x7.png)\(b\)VC performance stratified by the average degree of incident hyperedges\.

Figure 6:Dual structural stratification analysis on Arxiv\-HG\. Left: HEC performance under different hyperedge\-degree ranges\. Right: VC performance under different ranges of the average degree of hyperedges incident to the queried vertex\.
##### Extended analysis of hyperedge degree effects\.

We further analyze Hyper\-Align from two complementary structural views on Arxiv\-HG\. Fig\.[6](https://arxiv.org/html/2605.21858#A5.F6)\(a\) stratifies HEC test hyperedges by their hyperedge degree, while Fig\.[6](https://arxiv.org/html/2605.21858#A5.F6)\(b\) stratifies VC test vertices by the average degree of their incident hyperedges\. The former provides a hyperedge\-centered view, measuring how models behave when the target hyperedge itself contains different numbers of vertices\. The latter provides a vertex\-centered dual view, measuring how vertex classification changes when the queried vertex is surrounded by hyperedges with different average degrees\. Together, these two views characterize whether a model can benefit from increasingly rich high\-order associations\.

From the hyperedge\-centered view, Hyper\-Align consistently achieves the best performance across all degree ranges\. When the hyperedge degree is small, especially in the degree\-2 case, Hyper\-Align already outperforms graph\-based baselines, showing that our hypergraph formulation is naturally compatible with pairwise relations and can model such two\-way associations effectively\. As the hyperedge degree increases, the advantage of Hyper\-Align over Hyper\-Align\-clique becomes more evident\. This suggests that clique expansion can capture part of the relational signal, but cannot fully preserve the native grouping semantics of hyperedges\.

A consistent trend also appears from the vertex\-centered view\. When a vertex is mainly connected to low\-degree hyperedges, the local structure is closer to an ordinary pairwise or low\-order setting, where Hyper\-Align already remains competitive\. As the average degree of incident hyperedges increases, Hyper\-Align maintains the strongest performance and shows a larger advantage over the clique\-based variant\. This indicates that the benefit of native hypergraph modeling is not limited to classifying hyperedges themselves; it also improves vertex understanding when the surrounding context contains richer group\-level associations\.

Overall, the two stratified analyses lead to the same conclusion: Hyper\-Align is effective in both low\-order and high\-order regimes, while its advantage becomes more pronounced as local high\-order connectivity becomes richer\. These results further support our claim that preserving vertex\-hyperedge incidence structure is crucial for modeling high\-order associations, and that reducing hypergraphs to pairwise graphs loses important structural semantics from both the hyperedge\-centered and vertex\-centered perspectives\.

### Appendix FPairwise\-Indistinguishable Hypergraph Diagnostic

To further examine whether Hyper\-Align preserves hypergraph\-native high\-order associations, we construct a controlled diagnostic task, termed*Pairwise\-Indistinguishable Hypergraph Diagnostic*\. This experiment is not intended to replace downstream evaluation on real datasets\. Instead, it isolates a basic structural capability: when two hypergraphs induce exactly the same pairwise graph after clique expansion, can a model still distinguish them according to their original hyperedge grouping?

This diagnostic is directly motivated by the central distinction between ordinary graphs and hypergraphs\. In an ordinary graph, the basic structural unit is a pairwise edge\. In a hypergraph, however, the semantic focus is not merely whether two vertices are connected, but whether a set of vertices is associated as a whole through the same hyperedge\. Therefore, clique expansion may preserve pairwise adjacency while destroying the native grouping semantics of hyperedges\. The following diagnostic explicitly tests whether Hyper\-Align can exploit such grouping information that is lost under clique expansion\.

##### Construction\.

We first construct a pair of hypergraphs that induce the same pairwise graph after clique expansion but have different native hyperedge groupings\. Let the vertex set be

V=\{1,2,3,4,5,6\}\.V=\\\{1,2,3,4,5,6\\\}\.\(15\)We define two hypergraphs:

ℋA=\{\{1,2,3\},\{1,4,5\},\{2,4,6\},\{3,5,6\}\},\\mathcal\{H\}\_\{A\}=\\big\\\{\\\{1,2,3\\\},\\\{1,4,5\\\},\\\{2,4,6\\\},\\\{3,5,6\\\}\\big\\\},\(16\)and

ℋB=\{\{1,2,4\},\{1,3,5\},\{2,3,6\},\{4,5,6\}\}\.\\mathcal\{H\}\_\{B\}=\\big\\\{\\\{1,2,4\\\},\\\{1,3,5\\\},\\\{2,3,6\\\},\\\{4,5,6\\\}\\big\\\}\.\(17\)These two hypergraphs have the same basic statistics: both contain 6 vertices and 4 hyperedges, all hyperedges have degree 3, and every vertex has degree 2\. More importantly, after applying clique expansion to each hyperedge, the two hypergraphs induce exactly the same pairwise graph, with the same pairwise edge multiplicities\. The resulting pairwise edge set is

Eclique=\{\\displaystyle E\_\{\\mathrm\{clique\}\}=\\\{\(1,2\),\(1,3\),\(1,4\),\(1,5\),\(2,3\),\(2,4\),\(2,6\),\\displaystyle\(1,2\),\(1,3\),\(1,4\),\(1,5\),\(2,3\),\(2,4\),\(2,6\),\(18\)\(3,5\),\(3,6\),\(4,5\),\(4,6\),\(5,6\)\}\.\\displaystyle\(3,5\),\(3,6\),\(4,5\),\(4,6\),\(5,6\)\\\}\.Thus, any method that only relies on the clique\-expanded pairwise graph cannot distinguishℋA\\mathcal\{H\}\_\{A\}fromℋB\\mathcal\{H\}\_\{B\}using pairwise adjacency alone\.

However, the two hypergraphs have different native hyperedge groupings\. For example,\{1,2,3\}\\\{1,2,3\\\}is a real hyperedge inℋA\\mathcal\{H\}\_\{A\}, but it is not a hyperedge inℋB\\mathcal\{H\}\_\{B\}\. InℋB\\mathcal\{H\}\_\{B\}, the three pairwise relations\(1,2\)\(1,2\),\(1,3\)\(1,3\), and\(2,3\)\(2,3\)all exist after clique expansion, but the three vertices are not jointly connected by the same hyperedge\. This construction therefore directly tests whether a model can distinguish “three pairwise relations exist” from “three vertices jointly belong to one hyperedge\.”

##### Task definition\.

Based on the matched hypergraph pairs above, we define a binary task named*Same\-Hyperedge Membership*\. Given a vertex\-centered hypergraph, a center vertexcc, and two candidate verticesu,vu,v, the model is asked to predict whether the three vertices jointly belong to a single hyperedge:

y​\(ℋ,c,u,v\)=𝟏​\[∃e∈E,\{c,u,v\}⊆e\]\.y\(\\mathcal\{H\},c,u,v\)=\\mathbf\{1\}\\left\[\\exists e\\in E,\\ \\\{c,u,v\\\}\\subseteq e\\right\]\.\(19\)The answer isYesif such a hyperedge exists, andNootherwise\.

For each query triple, we generate a matched pair of samples fromℋA\\mathcal\{H\}\_\{A\}andℋB\\mathcal\{H\}\_\{B\}\. Since the two hypergraphs have the same clique\-expanded pairwise graph, a pairwise\-only method receives indistinguishable structural inputs for the two samples\. However, because their native hyperedge groupings differ, the gold labels of the two samples are opposite\. For instance, for the query\(1,2,3\)\(1,2,3\), the label isYesinℋA\\mathcal\{H\}\_\{A\}, since\{1,2,3\}\\\{1,2,3\\\}is a hyperedge, but the label isNoinℋB\\mathcal\{H\}\_\{B\}, since the three vertices are only pairwise connected after clique expansion and do not jointly belong to one hyperedge\.

##### Input format\.

To avoid directly exposing the answer through textual hyperedge lists, we do not provide the natural\-languageDetailssection in this diagnostic\. For each sample, we construct the query\-centered hypergraph context using HIDT and map it into soft hypergraph tokens through HIP\. These hypergraph tokens are inserted into the<hypergraph\>placeholder in the prompt:

> Given a vertex\-centered hypergraph:<hypergraph\>, where hyperedges represent native high\-order group memberships among vertices\. The hypergraph tokens mark one center vertex and two candidate vertices; no textual hyperedge list is provided\. Question: Do the center vertex and the two candidate vertices jointly occur in a single hyperedge? Directly answer Yes or No\.

Under this setting, the model cannot read a textual list of hyperedges\. Instead, it must rely on the inserted hypergraph tokens\. We use HIDT rather than the full HIDT\-O sequence because this diagnostic focuses on local same\-hyperedge membership, which should be directly captured by the fine\-grained incidence details in HIDT\.

##### Dataset variants\.

We construct two dataset variants\.Clean D20adds 20 distractor vertices and 20 distractor hyperedges to the core construction\. The same distractors are added to both samples in each matched pair, so the pairwise equivalence after clique expansion is preserved\. We filter out any distractor hyperedge that contains the complete query triple\{c,u,v\}\\\{c,u,v\\\}, so that the gold label is not changed\. Clean D20 contains 5000 training samples and 1000 test samples, corresponding to 2500 training matched pairs and 500 test matched pairs\. This version mainly serves as a clean sanity check for the data construction and evaluation pipeline\.

Adversarial D50further increases the difficulty by adding 50 distractor vertices, 50 random distractor hyperedges, and 18 query\-related decoy hyperedges\. For a query\(c,u,v\)\(c,u,v\), the generator adds decoy hyperedges such as\(c,u,x\)\(c,u,x\),\(c,v,y\)\(c,v,y\), and\(u,v,z\)\(u,v,z\), wherex,y,zx,y,zare distractor vertices\. These decoy hyperedges strengthen the pairwise evidence among the queried vertices while explicitly avoiding any hyperedge that contains the full query triple\{c,u,v\}\\\{c,u,v\\\}\. Thus, a negative sample may still contain strong pairwise evidence for\(c,u\)\(c,u\),\(c,v\)\(c,v\), and\(u,v\)\(u,v\), forcing the model to distinguish pairwise connectivity from native joint membership\. Adversarial D50 also contains 5000 training samples and 1000 test samples\.

##### Clique expansion baseline and leakage control\.

To explicitly evaluate the pairwise\-only setting, we construct a*clique expansion baseline*\. For each hyperedge in the original hypergraph, we replace it with all pairwise edges among its member vertices\. If the same pair of vertices co\-occurs in multiple hyperedges, we preserve the corresponding pairwise edge multiplicity\. After this conversion, all original hyperedges with degree larger than two are removed, and only the clique\-expanded pairwise graph remains\.

For every matched pair, we verify that the two converted samples have exactly the same pairwise edge multiset after clique expansion\. Therefore, this baseline has access to pairwise adjacency and pairwise edge multiplicity, but not to the original native hyperedge grouping\. We also apply several leakage\-control procedures: randomly permuting the core vertex IDs, using neutral vertex text without label signals, excluding textual hyperedge lists from the prompt, filtering distractor hyperedges that contain the full query triple, and splitting train/test data by canonical pair signatures to avoid isomorphic duplicates across splits\.

##### Metrics\.

We report three metrics over matched sample pairs\. Suppose there areNNmatched pairs, where the two samples in theii\-th pair are denoted as\(siA,siB\)\(s\_\{i\}^\{A\},s\_\{i\}^\{B\}\), with opposite labelsyiA≠yiBy\_\{i\}^\{A\}\\neq y\_\{i\}^\{B\}\. Lety^iA\\hat\{y\}\_\{i\}^\{A\}andy^iB\\hat\{y\}\_\{i\}^\{B\}be the parsed predictions\. Invalid outputs are treated as incorrect and are not counted as flips\.

Sample Acc\.is the standard accuracy over all individual samples:

SampleAcc\.=12​N∑i=1N\(𝟏\[y^iA=yiA\]\+𝟏\[y^iB=yiB\]\)\.\\mathrm\{Sample\\ Acc\.\}=\\frac\{1\}\{2N\}\\sum\_\{i=1\}^\{N\}\\left\(\\mathbf\{1\}\[\\hat\{y\}\_\{i\}^\{A\}=y\_\{i\}^\{A\}\]\+\\mathbf\{1\}\[\\hat\{y\}\_\{i\}^\{B\}=y\_\{i\}^\{B\}\]\\right\)\.\(20\)Pair Acc\.is a stricter pair\-level accuracy, which requires both samples in a matched pair to be predicted correctly:

PairAcc\.=1N∑i=1N𝟏\[y^iA=yiA∧y^iB=yiB\]\.\\mathrm\{Pair\\ Acc\.\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\mathbf\{1\}\[\\hat\{y\}\_\{i\}^\{A\}=y\_\{i\}^\{A\}\\wedge\\hat\{y\}\_\{i\}^\{B\}=y\_\{i\}^\{B\}\]\.\(21\)Flip Ratemeasures whether the model gives opposite valid predictions for the two samples in a matched pair:

Flip​Rate=1N​∑i=1N𝟏​\[y^iA,y^iB∈\{Yes,No\}∧y^iA≠y^iB\]\.\\mathrm\{Flip\\ Rate\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\mathbf\{1\}\[\\hat\{y\}\_\{i\}^\{A\},\\hat\{y\}\_\{i\}^\{B\}\\in\\\{\\mathrm\{Yes\},\\mathrm\{No\}\\\}\\wedge\\hat\{y\}\_\{i\}^\{A\}\\neq\\hat\{y\}\_\{i\}^\{B\}\]\.\(22\)Since the gold labels in each matched pair are opposite, Pair Acc\. implies a correct flip, while Flip Rate only measures whether the model can distinguish the two sides of the pair\. Therefore, a model may have a high Flip Rate only when it gives different predictions across the pair, but high Pair Acc\. further requires the direction of this distinction to be correct\.

For any deterministic pairwise\-only method, the two samples in a matched pair are indistinguishable after clique expansion, while their gold labels are opposite\. Such a method must produce the same prediction for both samples in each pair, and is therefore bounded by

SampleAcc\.=50\.00%,PairAcc\.=0\.00%,FlipRate=0\.00%\.\\mathrm\{Sample\\ Acc\.\}=50\.00\\%,\\quad\\mathrm\{Pair\\ Acc\.\}=0\.00\\%,\\quad\\mathrm\{Flip\\ Rate\}=0\.00\\%\.\(23\)
Table 9:Pairwise\-indistinguishable high\-order diagnostic\. Each matched pair induces the same pairwise graph after clique expansion but has different native hyperedge groupings\. The task asks whether the center vertex and two candidate vertices jointly occur in a single hyperedge\.MethodSample Acc\.Pair Acc\.Flip RatePairwise\-only deterministic bound50\.000\.000\.00Clique expansion baseline50\.000\.000\.00Hyper\-Align, Clean D20100\.00100\.00100\.00Hyper\-Align, Adv\-D5084\.8070\.4071\.20
##### Results\.

Table[9](https://arxiv.org/html/2605.21858#A6.T9)reports the diagnostic results\. The pairwise\-only deterministic bound denotes the theoretical limit of deterministic methods that rely only on the clique\-expanded pairwise graph\. The clique expansion baseline is the corresponding explicit implementation\.

As shown in the table, the clique expansion baseline exactly matches the pairwise\-only bound, achieving 50\.00% Sample Acc\., 0\.00% Pair Acc\., and 0\.00% Flip Rate\. This confirms that once the original hyperedge grouping is removed and only the clique\-expanded pairwise graph is retained, the model cannot distinguish the two samples in each matched pair\.

In contrast, Hyper\-Align achieves 100\.00% on all metrics in the Clean D20 setting, verifying that the construction and evaluation pipeline are correct\. On the more challenging Adversarial D50 setting, Hyper\-Align still achieves 84\.80% Sample Acc\., 70\.40% Pair Acc\., and 71\.20% Flip Rate\. These results show that Hyper\-Align can make different predictions for two hypergraphs that are identical under clique expansion but differ in their native hyperedge grouping\. Therefore, the prediction cannot be explained by pairwise adjacency alone\.

This diagnostic isolates the difference between pairwise adjacency and native hyperedge grouping\. Since the two samples in each matched pair induce the same pairwise graph after clique expansion, any pairwise\-only representation cannot reliably distinguish them\. The failure of the clique expansion baseline empirically confirms this limitation\.

Meanwhile, Hyper\-Align substantially outperforms the pairwise\-only bound without receiving textual hyperedge lists\. This indicates that HIDT\-style hypergraph tokenization preserves information about which vertices jointly belong to the same hyperedge, rather than merely flattening the hypergraph into ordinary graph tokens\. Therefore, this controlled diagnostic provides complementary evidence for our main claim that preserving vertex\-hyperedge incidence structure is crucial for modeling high\-order associations\. This diagnostic should be interpreted together with the main HyperAlign\-Bench results and the hyperedge\-degree\-stratified analysis: the main experiments evaluate downstream performance, while this diagnostic isolates a structural capability that clique expansion cannot express\.

### Appendix GDiscussion

The central idea of*Hypergraph as Language*is to treat a hypergraph as a language\-like structural input whose native vertex\-hyperedge incidence can be aligned with an LLM\. Hyper\-Align instantiates this perspective through HIDT\-O, HIP, and the Hypergraph\-as\-Language protocol\. By compiling query\-centered high\-order contexts into continuous hypergraph tokens, the framework preserves group\-level association semantics while making the resulting structural representation directly consumable by a frozen LLM\. This design combines the structural inductive bias of hypergraph learning with the semantic and instruction\-following ability of LLMs, and supports both vertex\-level and hyperedge\-level tasks under a unified question\-answering interface\.

The current study focuses on text\-attributed hypergraphs and classification\-style evaluation, which cover common citation, co\-occurrence, and collaboration scenarios but do not exhaust all possible hypergraph reasoning tasks\. Future work can extend Hypergraph as Language to broader settings such as hyperedge retrieval, hypergraph question answering, and generative reasoning over high\-order structures\. In addition, Hyper\-Align uses a fixed token budget for each query\-centered context\. HIDT\-O mitigates this constraint by combining local incidence details with overview\-level summaries, while extremely large or dense hypergraphs may further benefit from adaptive sampling or dynamic token allocation\. These directions are orthogonal to the proposed hypergraph\-native alignment principle and can further improve the scalability and generality of Hyper\-Align\.

Similar Articles

Graph Alignment Topology as an Inductive Bias for Grounding Detection

arXiv cs.CL

This paper introduces Graph Alignment Topology as an inductive bias for grounding detection, using a graph neural network to model alignment structure between reference information and LLM outputs. The method achieves state-of-the-art results on multiple hallucination and question-answering datasets, outperforming GPT-4o.