Measuring Graph-to-Graph Semantic Similarity in Knowledge Graphs: An Empirical Evaluation of Knowledge Graph Embeddings

arXiv cs.AI Papers

Summary

This paper introduces and empirically evaluates methods for measuring semantic similarity between knowledge graphs using KG embeddings, proposing EmbPairSim and AvgEmbSim scoring functions that outperform baselines like Sentence-BERT on WikiText-2 and CC-News datasets.

arXiv:2606.29180v1 Announce Type: new Abstract: A Knowledge Graph (KG) represents facts as structured triples and is widely used to organize relational knowledge across diverse domains. Just as textual information ranges from words and sentences to complete documents, KG information can be interpreted at multiple levels, from entities, relations, and triples to subgraphs and entire KGs. However, existing KG embedding methods mainly focus on entities, relations, and triples, leaving graph-level semantics largely unaddressed. Conventional graph-level methods, which typically compare graphs based on structural patterns, are also insufficient because structural similarity alone cannot guarantee semantic similarity between KGs. To evaluate how well different methods capture such graph-level semantic information, we study graph-to-graph semantic similarity, which determines whether a pair of KGs represents semantically corresponding underlying information. To obtain reliable ground-truth correspondences, we construct a semantic matching dataset by modifying text documents, extracting KGs from both original and modified documents, and transferring their known correspondences to KG pairs. We compare text-based, structure-based, and KG embedding-based approaches on each dataset. For the KG embedding-based approach, we introduce two scoring functions: \textit{EmbPairSim}, which uses maximal pairwise entity similarity, and \textit{AvgEmbSim}, which uses a frequency-weighted centroid. Experiments on WikiText-2 and CC-News show that \textit{EmbPairSim} achieves up to 5.3 pp higher MRR than Sentence-BERT while using substantially fewer parameters. These results suggest that KGE representations can serve as compact and effective signals for graph-to-graph semantic similarity in KGs. Our code is available at https://github.com/SeungRyeolBaek/KG-to-KG-Semantic-Similarity.
Original Article
View Cached Full Text

Cached at: 06/30/26, 05:32 AM

# Measuring Graph-to-Graph Semantic Similarity in Knowledge Graphs: An Empirical Evaluation of Knowledge Graph Embeddings
Source: [https://arxiv.org/html/2606.29180](https://arxiv.org/html/2606.29180)
SeungRyeol Baek,Wooseok SimDepartment of Computer Science and Engineering, Sungkyunkwan UniversitySuwonRepublic of KoreaandHogun ParkDepartment of Computer Science and Engineering, Sungkyunkwan UniversitySuwonRepublic of Korea

###### Abstract\.

A Knowledge Graph \(KG\) represents facts as structured triples and is widely used to organize relational knowledge across diverse domains\. Just as textual information ranges from words and sentences to complete documents, KG information can be interpreted at multiple levels, from entities, relations, and triples to subgraphs and entire KGs\. However, existing KG embedding methods mainly focus on entities, relations, and triples, leaving graph\-level semantics largely unaddressed\. Conventional graph\-level methods, which typically compare graphs based on structural patterns, are also insufficient because structural similarity alone cannot guarantee semantic similarity between KGs\. To evaluate how well different methods capture such graph\-level semantic information, we study graph\-to\-graph semantic similarity, which determines whether a pair of KGs represents semantically corresponding underlying information\. To obtain reliable ground\-truth correspondences, we construct a semantic matching dataset by modifying text documents, extracting KGs from both original and modified documents, and transferring their known correspondences to KG pairs\. We compare text\-based, structure\-based, and KG embedding\-based approaches on each dataset\. For the KG embedding\-based approach, we introduce two scoring functions:EmbPairSim, which uses maximal pairwise entity similarity, andAvgEmbSim, which uses a frequency\-weighted centroid\. Experiments on WikiText\-2 and CC\-News show thatEmbPairSimachieves up to 5\.3 pp higher MRR than Sentence\-BERT while using substantially fewer parameters\. These results suggest that KGE representations can serve as compact and effective signals for graph\-to\-graph semantic similarity in KGs\. Our code is available at[https://github\.com/SeungRyeolBaek/KG\-to\-KG\-Semantic\-Similarity](https://github.com/SeungRyeolBaek/KG-to-KG-Semantic-Similarity)\.

Knowledge Graph, Knowledge Graph Embedding, Semantic Similarity, Knowledge Graph Similarity

## 1\.Introduction

A*Knowledge Graph*\(KG\) represents a fact as a triple\(h,r,t\)\(h,r,t\)that connects ahead entityhhto atail entityttvia a typedrelationrr—e\.g\.,\(Paris, located\_in,France\)\. Due to their ability to organize complex relational knowledge in a structured form, KGs have become a widely adopted knowledge representation across diverse domains\. Examples include encyclopedic knowledge in Wikidata\(Vrandečić and Krötzsch,[2014](https://arxiv.org/html/2606.29180#bib.bib27)\), biomedical knowledge in BioKG\(Zhanget al\.,[2025](https://arxiv.org/html/2606.29180#bib.bib26)\), legal knowledge in LegalKG\(Filtz,[2017](https://arxiv.org/html/2606.29180#bib.bib24)\), scholarly knowledge in ORKG\(Jaradehet al\.,[2019](https://arxiv.org/html/2606.29180#bib.bib23)\), event\-centric knowledge in EventKG\(Guanet al\.,[2023](https://arxiv.org/html/2606.29180#bib.bib22)\), and visual relationships in scene graphs\(Krishnaet al\.,[2017](https://arxiv.org/html/2606.29180#bib.bib25)\)\.

Information stored in KGs can be analyzed at multiple levels, ranging from entities and relations to triples, subgraphs, and entire KGs\. Entities and relations denote individual units and the types of connections between them\. Triples represent specific relational facts\. Subgraphs, formed by multiple triples, can capture broader semantic units, such as events, research contributions, or scene regions\. An entire KG can represent a complete information source, such as encyclopedic, biomedical, scholarly, visual, or document\-derived knowledge\. Appendix[A](https://arxiv.org/html/2606.29180#A1)illustrates this multi\-level view across representative KG domains\. This view is analogous to textual information, which can be studied at different levels ranging from individual words and sentences to complete documents\.

Although semantic representations of KGs have been extensively studied, most existing approaches focus on local components such as entities, relations, and triples rather than subgraphs or entire KGs\. Triple\-centered language\-model approaches such as KG\-BERT\(Yaoet al\.,[2019](https://arxiv.org/html/2606.29180#bib.bib1)\), KEPLER\(Wanget al\.,[2021](https://arxiv.org/html/2606.29180#bib.bib3)\), and RLKB\(Fanet al\.,[2017](https://arxiv.org/html/2606.29180#bib.bib56)\)verbalize individual triples and encode them with Transformers\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.29180#bib.bib2)\), but rely primarily on triple\-level semantics rather than the graph structure of an entire KG\. Knowledge Graph Embedding \(KGE\) models, including transductive approaches\(Bordeset al\.,[2013](https://arxiv.org/html/2606.29180#bib.bib37); Yanget al\.,[2015](https://arxiv.org/html/2606.29180#bib.bib38); Trouillonet al\.,[2016](https://arxiv.org/html/2606.29180#bib.bib39); Sunet al\.,[2019](https://arxiv.org/html/2606.29180#bib.bib40)\)and inductive approaches\(Leeet al\.,[2023](https://arxiv.org/html/2606.29180#bib.bib41)\), incorporate both semantic and structural information\. Nevertheless, these methods mainly model local KG components, such as entities, relations, and triples, rather than the semantic information represented by an entire KG\.

Addressing KGs at the graph level may appear to align with conventional graph\-level tasks\. However, conventional graph\-level tasks usually characterize graphs based on structural patterns, such as topology, connectivity, and substructures\. This structural perspective is useful, but it is not sufficient for KGs, because KGs are constructed by organizing semantic information\. In particular, structural similarity alone cannot guarantee that two KGs represent similar information\.

![Refer to caption](https://arxiv.org/html/2606.29180v1/framework/framework.png)Figure 1\.Overview of oursemantic matchingtask\.Together, these limitations indicate that graph\-level semantic information in KGs remains insufficiently addressed\. Therefore, we aim to evaluate how well different types of methods can capture the semantic information represented by an entire KG\. A natural evaluation setting for this purpose is graph\-to\-graph semantic similarity, where semantically corresponding KGs should be identified among other candidate graphs\. Specifically, we formulate the task as determining whether two KGs represent the same underlying information at the graph level, rather than whether they merely share similar structures\. Figure[1](https://arxiv.org/html/2606.29180#S1.F1)exemplifies the task\. Given a query KG,GqG\_\{q\}\(top\), we rank candidate graphs\{Gi\}\\\{G\_\{i\}\\\}such that the one encoding the*same real‑world situation*surfaces first\. An effective measure requires semantic awareness to bridge lexical variation, structural sensitivity to account for graph\-level organization, and scalability to handle document\-sized KGs\.

To evaluate this task, however, we need reliable ground\-truth correspondences between KGs\. Such correspondences are difficult to obtain from existing KGs, which are often already processed into graph form, making it difficult to trace their original sources or determine whether two independently constructed KGs should be regarded as semantically similar\. To obtain controllable semantic correspondences, we instead start from text documents\. By applying meaning\-preserving modifications, such as lexical substitution and paraphrasing, each original document can be paired with a modified document that expresses the same underlying information\. After converting both the original and modified documents into KGs, the known document\-level correspondence can be transferred to the resulting KG pair\. This allows us to construct evaluation data in which each query KG has a clear ground\-truth counterpart\.

Based on this idea, we construct asemantic matchingdataset: \(i\) documents fromWikiText‑2andCC‑Newsare parsed into KGs with an LLM pipeline; \(ii\) each document is paraphrased at six lexical/structural intensities; \(iii\) KGs are re‑extracted from the paraphrases; and \(iv\) every query graph must retrieve its true counterpart among hundreds of distractors\. We report Hits@5, MRR, and NDCG across the graph pairs\.

Using each dataset, we empirically compare text\-based, structure\-based, and KG embedding\-based approaches for graph\-to\-graph semantic similarity\. For the KG embedding\-based approach, we introduce two complementary, scoring functions that lift off\-the\-shelf KGEs to the graph level: \(1\)EmbPairSim: computes maximal pairwise cosine similarity between entities \(and optionally relations\) in two graphs, preserving fine\-grained correspondences; \(2\)AvgEmbSim: forms a single frequency\-weighted centroid vector per graph, enabling more efficient retrieval\.

For comparison, we use Sentence\-BERT \(SBERT\)\(Reimers and Gurevych,[2019](https://arxiv.org/html/2606.29180#bib.bib48)\)as the text\-based approach and Graph Kernel methods\(Sugiyama and Borgwardt,[2015](https://arxiv.org/html/2606.29180#bib.bib35); Shervashidzeet al\.,[2011](https://arxiv.org/html/2606.29180#bib.bib36)\)as the structure\-based approach\. For SBERT, KG triples are verbalized before being used as input\. Across our dataset,EmbPairSimachieves up to 5\.3 pp higher MRR than Sentence\-BERT while using substantially fewer parameters\. These results suggest that KGE representations can serve as compact and effective signals for graph\-to\-graph semantic similarity in KGs\.

Our key contributions are as follows:

- •Task & dataset\. To the best of our knowledge, we release the first benchmark specifically targeting document\-derived KG\-to\-KG semantic retrieval under controlled paraphrasing conditions\.
- •Lightweight KGE aggregation\. We proposeEmbPairSimandAvgEmbSim, two scoring functions that require*no*retraining and run in sub‑second time\.
- •Empirical findings\.EmbPairSimachieves up to 5\.3 pp higher MRR than Sentence\-BERT while using an order of magnitude fewer parameters; an ablation study shows that frequency weighting generally helps, mean\-centering is important for INGRAM\-based similarity, and relation embeddings can introduce noise\.

## 2\.Related Work

### 2\.1\.Language\-based Representations

Word embeddings like Word2Vec\(Mikolovet al\.,[2013a](https://arxiv.org/html/2606.29180#bib.bib47)\)represent words with similar contexts close together in a continuous vector space\. Sentence\-level models such as Sentence\-BERT \(SBERT\)\(Reimers and Gurevych,[2019](https://arxiv.org/html/2606.29180#bib.bib48)\)extend this to entire sentences or documents by capturing contextual information across tokens\. These embeddings support semantic similarity, retrieval, and clustering\. In our work, we leverage SBERT to measure semantic similarity between verbalized Knowledge Graphs \(KGs\)\. This approach serves as a language\-based baseline for assessing graph\-level similarity\.

### 2\.2\.Structure‑based Approaches

Graph\-based methods compare graph structures by identifying matching substructures or edit operations\. Early approaches such as Graph Edit Distance \(GED\)\(Abu\-Aishehet al\.,[2015](https://arxiv.org/html/2606.29180#bib.bib45)\)and Ullmann’s subgraph matching\(Ullmann,[1976](https://arxiv.org/html/2606.29180#bib.bib46)\)provide exact structural similarity but are computationally expensive and do not scale well\. Graph kernels address this by mapping graphs to high\-dimensional feature spaces based on statistical patterns, enabling efficient similarity computation\. Examples include the Vertex and Edge Histogram Kernels\(Sugiyama and Borgwardt,[2015](https://arxiv.org/html/2606.29180#bib.bib35)\), which use label distributions, and the Weisfeiler\-Lehman \(WL\) kernel\(Shervashidzeet al\.,[2011](https://arxiv.org/html/2606.29180#bib.bib36)\), which iteratively relabels nodes to capture multi\-level structural information\. These kernels balance structural detail and scalability, making them practical for large graphs\. In our work, we use graph kernels as structure\-based baselines to evaluate how well surface\-level structural features approximate semantic similarity between KGs\.

### 2\.3\.Knowledge Graph Embedding \(KGE\) Methods

Knowledge Graph Embedding \(KGE\) methods map entities and relations into continuous vector spaces while preserving structural properties\. They support tasks such as link prediction, entity classification, and graph completion\. TransE\(Bordeset al\.,[2013](https://arxiv.org/html/2606.29180#bib.bib37)\)models relations as translations in embedding space, while DistMult\(Yanget al\.,[2015](https://arxiv.org/html/2606.29180#bib.bib38)\)uses a multiplicative scoring function based on element\-wise interactions\. ComplEx\(Trouillonet al\.,[2016](https://arxiv.org/html/2606.29180#bib.bib39)\)extends this by using complex\-valued embeddings for richer relational patterns, and RotatE\(Sunet al\.,[2019](https://arxiv.org/html/2606.29180#bib.bib40)\)models relations as rotations in complex space\. These models are transductive and require all entities and relations during training\. In contrast, INGRAM\(Leeet al\.,[2023](https://arxiv.org/html/2606.29180#bib.bib41)\)follows an inductive paradigm: a graph neural network derives embeddings for previously unseen entities and relations directly from their local graph structure\. It relies exclusively on structural context and does not retain any entity or relation vectors produced during training\. These KGE models are analogous to word embeddings but additionally encode graph\-specific structure\. We use them as structure\-aware baselines to assess graph\-level semantic similarity\.

## 3\.Preliminaries

### 3\.1\.Notations

A Knowledge Graph \(KG\) is defined as a tupleG=\(ℰ,ℛ,𝒯\)G=\(\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{T\}\), whereℰ\\mathcal\{E\}is the set of entities \(\|ℰ\|\|\\mathcal\{E\}\|denotes its size\),ℛ\\mathcal\{R\}is the set of relations \(\|ℛ\|\|\\mathcal\{R\}\|denotes its size\), and𝒯⊆\{\(h,r,t\)\|h,t∈ℰ,r∈ℛ\}\\mathcal\{T\}\\subseteq\\\{\(h,r,t\)~\|~h,t\\in\\mathcal\{E\},r\\in\\mathcal\{R\}\\\}is the set of triples, and\|𝒯\|\|\\mathcal\{T\}\|is their total count\. Each triple\(h,r,t\)\(h,r,t\)represents a directed relation from head entityhhto tail entityttvia relationrr\. Thus, a KG can be viewed as a directed labeled graph with entities as nodes and triples as edges\.

### 3\.2\.Knowledge graph as a Document

Textual data exhibits a hierarchical structure: words form sentences, sentences combine into paragraphs, and paragraphs compose a document\. Although a knowledge graph arranges entities and relations in a non\-linear network without a fixed reading order, we can view it through the same hierarchy: entities and relations act as words, each triplet as a sentence, and subgraphs—or the entire graph—as paragraphs or even a full document\. Viewing KGs this way allows us to apply document\-level operations, such as similarity measurement, while capturing both structure and semantics\.

### 3\.3\.Graph Kernel\-based Approach

As a baseline for measuring similarity between KGs, we use graph kernel methods\. Given two KGs,GGandG′G^\{\\prime\}, their similarity is computed as:

\(1\)Skernel=K​\(G,G′\)\.S\_\{\\text\{kernel\}\}=K\(G,G^\{\\prime\}\)\.This approach relies purely on structural features\. We use two kernels in our main experiments: the Vertex Histogram Kernel\(Sugiyama and Borgwardt,[2015](https://arxiv.org/html/2606.29180#bib.bib35)\), and the Weisfeiler\-Lehman \(WL\) kernel\(Shervashidzeet al\.,[2011](https://arxiv.org/html/2606.29180#bib.bib36)\)\. The Vertex Histogram Kernel compares distributions of discrete entity identifiers without using learned semantic embeddings\. The WL Kernel further incorporates local graph topology by iteratively updating node labels, providing richer structural similarity than simple histograms\.

## 4\.Similarity Evaluation Setup

##### Problem formulation\.

To objectively evaluate the graph\-to\-graph semantic similarity task, we formulate a semantic matching problem between two KG sets\. Let𝒢=\{G1,…,Gm\}\\mathcal\{G\}=\\\{G\_\{1\},\\ldots,G\_\{m\}\\\}be a set of candidate KGs and let𝒢′=\{G1′,…,Gn′\}\\mathcal\{G\}^\{\\prime\}=\\\{G^\{\\prime\}\_\{1\},\\ldots,G^\{\\prime\}\_\{n\}\\\}be a set of query KGs, wherem≥nm\\geq n\. This formulation assumes that eachGi′G^\{\\prime\}\_\{i\}represents the same underlying information asGiG\_\{i\}, while it is distinct from other candidate graphsGjG\_\{j\}\(j≠ij\\neq i\)\. Based on this assumption, the goal is to identify, for each query KGGi′∈𝒢′G^\{\\prime\}\_\{i\}\\in\\mathcal\{G\}^\{\\prime\}, the candidate KGGi∈𝒢G\_\{i\}\\in\\mathcal\{G\}that semantically matches it\.

##### Construction of aligned KG pairs\.

To obtain controllable semantic correspondences, we first construct semantically corresponding document pairs and then convert each document into a KG\. Specifically, given a document set𝒟=\{D1,…,Dm\}\\mathcal\{D\}=\\\{D\_\{1\},\\ldots,D\_\{m\}\\\}, we first select a subset𝒟n=\{D1,…,Dn\}⊂𝒟\\mathcal\{D\}\_\{n\}=\\\{D\_\{1\},\\ldots,D\_\{n\}\\\}\\subset\\mathcal\{D\}\(m≥nm\\geq n\)\. We then generate modified documents𝒟′=\{D1′,…,Dn′\}\\mathcal\{D\}^\{\\prime\}=\\\{D^\{\\prime\}\_\{1\},\\ldots,D^\{\\prime\}\_\{n\}\\\}using a document modification function that preserves the meaning of the original document\. Each modified documentDi′D^\{\\prime\}\_\{i\}preserves the underlying information of its original documentDiD\_\{i\}while remaining distinct from other documentsDjD\_\{j\}\(j≠ij\\neq i\)\.

We then convert both original and modified documents into KGs using the same LLM pipeline\. Specifically, using a same Large Language Model \(LLM\)\(OpenAI,[2023](https://arxiv.org/html/2606.29180#bib.bib33)\)pipeline, we extract a KGGiG\_\{i\}from each original documentDiD\_\{i\}and a KGGi′G^\{\\prime\}\_\{i\}from each modified documentDi′D^\{\\prime\}\_\{i\}, forming𝒢=\{G1,…,Gm\}\\mathcal\{G\}=\\\{G\_\{1\},\\ldots,G\_\{m\}\\\}and𝒢′=\{G1′,…,Gn′\}\\mathcal\{G\}^\{\\prime\}=\\\{G^\{\\prime\}\_\{1\},\\ldots,G^\{\\prime\}\_\{n\}\\\}, respectively SinceGiG\_\{i\}andGi′G^\{\\prime\}\_\{i\}are extracted from semantically corresponding documents, the pair\(Gi′,Gi\)\(G^\{\\prime\}\_\{i\},G\_\{i\}\)provides a ground\-truth semantic match\. As a result, we obtain KG sets with controllable semantic correspondences that are suitable for the semantic matching problem

##### Evaluation protocol\.

Given the constructed KG sets𝒢\\mathcal\{G\}and𝒢′\\mathcal\{G\}^\{\\prime\}, we compute pairwise similarity scores between each query KGGi′∈𝒢′G^\{\\prime\}\_\{i\}\\in\\mathcal\{G\}^\{\\prime\}and every candidate KGGj∈𝒢G\_\{j\}\\in\\mathcal\{G\}\. These scores form a similarity matrix whose\(i,j\)\(i,j\)\-th entry indicates the predicted semantic similarity betweenGi′G^\{\\prime\}\_\{i\}andGjG\_\{j\}\. For each query KGGi′G^\{\\prime\}\_\{i\}, candidate KGs are ranked by their similarity scores\. The ranking is evaluated by checking whether the ground\-truth counterpartGiG\_\{i\}is ranked highly among all candidates\. We report standard retrieval metrics, including Hits, NDCG, and MRR\.

## 5\.Methods

### 5\.1\.Embedding\-Based Pairwise Similarity

We proposeEmbedding\-based Pairwise Similarity \(EmbPairSim\)to measure semantic similarity between two Knowledge Graphs \(KGs\),G=\(ℰ,ℛ,𝒯\)G=\(\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{T\}\)andG′=\(ℰ′,ℛ′,𝒯′\)G^\{\\prime\}=\(\\mathcal\{E\}^\{\\prime\},\\mathcal\{R\}^\{\\prime\},\\mathcal\{T\}^\{\\prime\}\)\. Instead of exact label matching, which is often too rigid, we use embeddings\-vector representations of entities and relations\-to compare elements based on their semantic meaning\.

#### 5\.1\.1\.Embedding Generation

We generate embeddings for all entities and relations using an embedding functionEmbed​\(⋅\)\\text\{Embed\}\(\\cdot\)\. For entities, this yields:

\(2\)ei=Embed​\(ei\),for​ei∈ℰ,\\displaystyle\\textbf\{e\}\_\{i\}=\\text\{Embed\}\(e\_\{i\}\),~\\text\{for\}~e\_\{i\}\\in\\mathcal\{E\},\(3\)ej′=Embed​\(ej′\),for​ej′∈ℰ′\.\\displaystyle\\textbf\{e\}\_\{j\}^\{\\prime\}=\\text\{Embed\}\(e\_\{j\}^\{\\prime\}\),~\\text\{for\}~e\_\{j\}^\{\\prime\}\\in\\mathcal\{E\}^\{\\prime\}\.Eachei\\textbf\{e\}\_\{i\}andej′\\textbf\{e\}\_\{j\}^\{\\prime\}encodes the semantic meaning of the corresponding entity inGGandG′G^\{\\prime\}, respectively\. The embedding functionEmbed​\(⋅\)\\text\{Embed\}\(\\cdot\)can be instantiated either by a transductive KGE model, where entity embeddings for bothGGandG′G^\{\\prime\}are learned jointly in a shared embedding space by training on the union of their triples, or by an inductive KGE model that derives embeddings from graph structure using pretrained parameters\.

#### 5\.1\.2\.Mean Centering

To align embeddings from both graphs, we applymean centeringby computing their joint mean vector and subtracting it from each entity and relation embedding before similarity calculation\.

\(4\)μE\\displaystyle\\mu\_\{E\}=1\|ℰ\|\+\|ℰ′\|​\(∑i=1\|ℰ\|ei\+∑j=1\|ℰ′\|ej′\),\\displaystyle=\\frac\{1\}\{\|\\mathcal\{E\}\|\+\|\\mathcal\{E\}^\{\\prime\}\|\}\\\!\\left\(\\sum\_\{i=1\}^\{\|\\mathcal\{E\}\|\}\\textbf\{e\}\_\{i\}\+\\sum\_\{j=1\}^\{\|\\mathcal\{E\}^\{\\prime\}\|\}\\textbf\{e\}^\{\\prime\}\_\{\\,j\}\\right\),ei\\displaystyle\\textbf\{e\}\_\{i\}←ei−μE,ej′←ej′−μE\.\\displaystyle\\leftarrow\\textbf\{e\}\_\{i\}\-\\mu\_\{E\},\\quad\\textbf\{e\}^\{\\prime\}\_\{\\,j\}\\leftarrow\\textbf\{e\}^\{\\prime\}\_\{\\,j\}\-\\mu\_\{E\}\.We use a joint mean vector from both graphs to preserve the relative positions between embeddings across the two KGs while spreading them around the origin, thereby increasing the contrast of cosine similarity distributions during comparison\.

#### 5\.1\.3\.Stacking Embedding Vectors to Matrix

We stack these embedding vectors to obtain matrices:

\(5\)E=\[e1,e2,…,e\|ℰ\|\],\\displaystyle\\textbf\{E\}=\\left\[\\textbf\{e\}\_\{1\},\\textbf\{e\}\_\{2\},\.\.\.,\\textbf\{e\}\_\{\|\\mathcal\{E\}\|\}\\right\],E′=\[e1′,e2′,…,e\|ℰ′\|′\],\\displaystyle\\textbf\{E\}^\{\\prime\}=\\left\[\\textbf\{e\}^\{\\prime\}\_\{1\},\\textbf\{e\}^\{\\prime\}\_\{2\},\.\.\.,\\textbf\{e\}^\{\\prime\}\_\{\|\\mathcal\{E\}^\{\\prime\}\|\}\\right\],where𝐄∈ℝd×\|ℰ\|\\mathbf\{E\}\\in\\mathbb\{R\}^\{d\\times\|\\mathcal\{E\}\|\}and𝐄′∈ℝd×\|ℰ′\|\\mathbf\{E\}^\{\\prime\}\\in\\mathbb\{R\}^\{d\\times\|\\mathcal\{E\}^\{\\prime\}\|\}contain the entity embeddings ofGGandG′G^\{\\prime\}, respectively\. These matrices are used in the subsequent pairwise similarity computation\.

#### 5\.1\.4\.Pairwise Similarity Calculation

We compute pairwisecosine similaritiesby first applying column\-wise normalization to embedding matrices𝐄\\mathbf\{E\}and𝐄′\\mathbf\{E\}^\{\\prime\}and then taking the dot product between the normalized embedding matrices\.

\(6\)𝐒𝐢𝐦𝐄=𝐄^⊤𝐄^′,𝐄^=𝐄diag\(𝐄⊤𝐄\)−12,𝐄^′=𝐄′diag\(𝐄′⊤𝐄′\)−12,\\mathbf\{Sim\}^\{\\mathbf\{E\}\}=\\hat\{\\mathbf\{E\}\}^\{\\top\}\\hat\{\\mathbf\{E\}\}^\{\\prime\},\\quad\\hat\{\\mathbf\{E\}\}=\\mathbf\{E\}\\operatorname\{diag\}\(\\mathbf\{E\}^\{\\top\}\\mathbf\{E\}\)^\{\-\\frac\{1\}\{2\}\},\\quad\\hat\{\\mathbf\{E\}\}^\{\\prime\}=\\mathbf\{E\}^\{\\prime\}\\operatorname\{diag\}\(\{\\mathbf\{E\}^\{\\prime\}\}^\{\\top\}\\mathbf\{E\}^\{\\prime\}\)^\{\-\\frac\{1\}\{2\}\},wherediag⁡\(⋅\)\\operatorname\{diag\}\(\\cdot\)extracts the diagonal elements of a square matrix and forms a diagonal matrix from them\. Each value in the matrix𝐒𝐢𝐦𝐄\\mathbf\{Sim\}^\{\\mathbf\{E\}\}is the cosine similarity between an entity fromGGand one fromG′G^\{\\prime\}\.

#### 5\.1\.5\.Handling Size Differences

When comparing KGs of different sizes, we measure how well the entities in the smaller KG are covered by entities in the larger KG\. If similarity is instead aggregated from the larger KG to the smaller KG, multiple entities in the larger KG are inevitably matched to the same entity in the smaller KG, which can make additional information in the larger KG negatively affect the similarity score\. For each entity in the smaller KG, we find its most similar entity in the larger KG by taking the maximum similarity value from𝐒𝐢𝐦𝐄\\mathbf\{Sim\}^\{\\mathbf\{E\}\}\. Each row of𝐒𝐢𝐦𝐄\\mathbf\{Sim\}^\{\\mathbf\{E\}\}corresponds to an entity inGGand each column to one inG′G^\{\\prime\}\. Selecting the maximum per row or column yields the closest match for entities in one graph to the other\. Multiple entities can still share the same nearest matching entity\.

\(7\)SV=\{\{maxi∈ℰ,𝐒𝐢𝐦i,j𝐄\|j∈ℰ′\},if​\|ℰ\|\>\|ℰ′\|,\{maxj∈ℰ′,𝐒𝐢𝐦i,j𝐄\|i∈ℰ\}\.otherwise\.S\_\{V\}=\\begin\{cases\}\\\{\\max\_\{i\\in\\mathcal\{E\}\},~\\mathbf\{Sim\}^\{\\mathbf\{E\}\}\_\{i,j\}\|j\\in\\mathcal\{E\}^\{\\prime\}\\\},&\\text\{if \}\|\\mathcal\{E\}\|\>\|\\mathcal\{E\}^\{\\prime\}\|,\\\\ \\\{\\max\_\{j\\in\\mathcal\{E\}^\{\\prime\}\},\\mathbf\{Sim\}^\{\\mathbf\{E\}\}\_\{i,j\}\|i\\in\\mathcal\{E\}\\\}\.&\\text\{otherwise\}\.\\end\{cases\}Here,𝐒𝐢𝐦i,j𝐄\\mathbf\{Sim\}^\{\\mathbf\{E\}\}\_\{i,j\}denotes the similarity between theii\-th entity inGGand thejj\-th entity inG′G^\{\\prime\}\.SVS\_\{V\}collects the highest similarity scores for entities in the smaller KG to their best matches in the larger KG\.

#### 5\.1\.6\.Computing the Final Similarity Score

Finally, the overall similarity betweenGGandG′G^\{\\prime\}is defined as the fraction of scoresssinSVS\_\{V\}exceeding a thresholdtt:

\(8\)Spair=\|\{s∈SV\|s\>t\}\|\|SV\|\.S\_\{\\text\{pair\}\}=\{\\frac\{\|\\\{s\\in S\_\{V\}\|s\>t\\\}\|\}\{\{\|S\_\{V\}\|\}\}\}\.A higherSpairS\_\{\\text\{pair\}\}indicates greater semantic similarity between two KGs\.

### 5\.2\.Averaged Embedding\-Based Similarity

EmbPairSim preserves the full information in KG embeddings but must compute and aggregate pair\-wise similarities for all entity pairs, which becomes costly as the graph grows\. Moreover, it does not produce a single representation of a KG\. To obtain a single graph\-level representation and avoid this quadratic overhead, we proposeAveraged Embedding\-based Similarity \(AvgEmbSim\), which represents each graph by the frequency\-weighted average of its embeddings; this summary vector provides a candidate proxy for the whole graph whose efficiency we will explicitly evaluate, and comparing weighted versus plain averages lets us test whether entity frequency matters in KGs as term frequency does in text\.

#### 5\.2\.1\.Frequency\-Weighted Averaging

To reflect the importance of each entity, we weight their embeddings by frequency, the number of triples in the knowledge graph that contain each entity, similar to word frequency in NLP\. The frequency\-weighted average embeddings are:

\(9\)e¯=∑e∈ℰfreq​\(e\)​Embed​\(e\)∑e∈ℰfreq​\(e\),e′¯=∑e′∈ℰ′freq​\(e′\)​Embed​\(e′\)∑e′∈ℰ′freq​\(e′\),\\displaystyle\\bar\{\\textbf\{e\}\}=\{\\frac\{\\sum\_\{e\\in\\mathcal\{E\}\}\\text\{freq\}\(e\)~\\text\{Embed\}\(e\)\}\{\\sum\_\{e\\in\\mathcal\{E\}\}\\text\{freq\}\(e\)\}\},\\bar\{\\textbf\{e\}^\{\\prime\}\}=\{\\frac\{\\sum\_\{e^\{\\prime\}\\in\\mathcal\{E\}^\{\\prime\}\}\\text\{freq\}\(e^\{\\prime\}\)~\\text\{Embed\}\(e^\{\\prime\}\)\}\{\\sum\_\{e^\{\\prime\}\\in\\mathcal\{E\}^\{\\prime\}\}\\text\{freq\}\(e^\{\\prime\}\)\}\},wherefreq​\(e\)\\text\{freq\}\(e\)denotes the frequency of entityee\.

#### 5\.2\.2\.Similarity Calculation

Finally, we compute the cosine similarity between the averaged entity embeddings ofGGandG′G^\{\\prime\}:

\(10\)Savg=e¯⋅e′¯‖e¯‖​‖e′¯‖\.S\_\{\\text\{avg\}\}=\{\\frac\{\\bar\{\\textbf\{e\}\}\\cdot\\bar\{\\textbf\{e\}^\{\\prime\}\}\}\{\|\|\\bar\{\\textbf\{e\}\}\|\|~\|\|\\bar\{\\textbf\{e\}^\{\\prime\}\}\|\|\}\}\.

## 6\.Experiments

### 6\.1\.Experimental Settings

#### 6\.1\.1\.Datasets & KG Extraction

We use two datasets:WikiText\-2\(Merityet al\.,[2017](https://arxiv.org/html/2606.29180#bib.bib43)\)\(645 Wikipedia documents\) andCC\-News\(Mackenzieet al\.,[2020](https://arxiv.org/html/2606.29180#bib.bib9)\)\(550 news articles\)\. We treat each dataset as a separate document collection and construct𝒟\\mathcal\{D\},𝒟′\\mathcal\{D\}^\{\\prime\},𝒢\\mathcal\{G\}, and𝒢′\\mathcal\{G\}^\{\\prime\}independently from each dataset\. Knowledge Graphs \(KGs\) are extracted using the LLMGraphTransformer from LangChain\(Chase and the LangChain contributors,[2025](https://arxiv.org/html/2606.29180#bib.bib32)\)with GPT\-3\.5\-turbo\(OpenAI,[2023](https://arxiv.org/html/2606.29180#bib.bib33)\), yielding graph sets𝒢\\mathcal\{G\}and𝒢′\\mathcal\{G\}^\{\\prime\}\. We use the default prompt and extraction pipeline provided by the LLMGraphTransformer without additional prompt engineering or task\-specific modifications\. All steps are performed independently forWikiText\-2andCC\-News\.

#### 6\.1\.2\.Document Modification

\(1\)Synonym Replacement: words are randomly selected and replaced with synonyms from WordNet\(Miller,[1995](https://arxiv.org/html/2606.29180#bib.bib20)\)\. \(2\)Context Replacement: BERT\(Devlinet al\.,[2019](https://arxiv.org/html/2606.29180#bib.bib44)\)is used to select contextually important words, which are then replaced with synonyms from WordNet\(Miller,[1995](https://arxiv.org/html/2606.29180#bib.bib20)\)\. \(3\)DIPPER Paraphraser: generates diverse paraphrases with the DIPPER model\(Krishnaet al\.,[2023](https://arxiv.org/html/2606.29180#bib.bib34)\)\. We categorize the modified document sets by method:Synonym,Context, andDIPPER\. ForSynonymandContext, modification strengths30\(%\) and60\(%\) indicate the fraction of tokens changed\. ForDIPPER, we test60/0and60/20, where the formatL/OL/Odenotes lexical diversity \(LL\) and order diversity \(OO\)\. These variations test the impact of different paraphrasing strategies on KG extraction and similarity\. For each dataset, we randomly sample 200 documents and generate modified versions to form𝒟′\\mathcal\{D\}^\{\\prime\}\.

#### 6\.1\.3\.Evaluation Metrics

We assess similarity performance using Hits@5, Mean Reciprocal Rank \(MRR\), and Normalized Discounted Cumulative Gain \(NDCG\)\. Hits@5 measures how often the correct KG appears in the top 5, MRR captures the average rank of the correct match, and NDCG evaluates ranking quality considering both relevance and position\. Together, these metrics indicate how well the scores reflect true semantic similarity\.

#### 6\.1\.4\.Baselines

We use text\-based, graph kernel\-based, and KG embedding\-based baselines to compare how well each approach captures semantic similarity\. For the embedding\-based baselines, including Word2Vec, FastText, and transductive KGE models, we mainly followed the standard/default hyperparameter settings provided in the original papers or official implementations\. As a result, the transductive KGE models were evaluated under the same hyperparameter settings, including embedding dimension, margin, batch size, and learning rate\.

- •Text\-based Similarity\.Each KG is verbalized into OpenIE sentences \(e\.g\.,⟨h,r,t⟩→\\langle h,r,t\\rangle\\\!\\to\\\!“hhrrtt\.”\) with NLTK\(Birdet al\.,[2009](https://arxiv.org/html/2606.29180#bib.bib10)\), and graph similarity is computed from sentence embeddings given by the pretrained Sentence\-BERT \(SBERT\) checkpointall\-mpnet\-base\-v2\(Reimers and Gurevych,[2021](https://arxiv.org/html/2606.29180#bib.bib55)\)\. Moreover, in Table[5](https://arxiv.org/html/2606.29180#S6.T5)we adopted two word embedding models, Word2Vec\(Mikolovet al\.,[2013a](https://arxiv.org/html/2606.29180#bib.bib47)\)and FastText\(Joulinet al\.,[2017](https://arxiv.org/html/2606.29180#bib.bib6)\)\. For a non\-pretrained setting, we train both models on the OpenIE sentences for 10 epochs, using 200\-dimensional vectors, a window size of=5=5, negative samples=5=5, and min\_count=2=2with uniform initialization for both models\. In the pretrained configuration, we employ the 300\-dimensional Google News Word2Vec embeddings\(Mikolovet al\.,[2013b](https://arxiv.org/html/2606.29180#bib.bib7)\), and the FastText embeddings trained on the Wikipedia 2017, UMBC WebBase, and News Crawl corpora\(Joulinet al\.,[2017](https://arxiv.org/html/2606.29180#bib.bib6)\)\.
- •Graph Kernel\-based Similarity\.We use the Vertex Histogram \(VH\) and Weisfeiler\-Lehman \(WL\) kernels\(Sugiyama and Borgwardt,[2015](https://arxiv.org/html/2606.29180#bib.bib35); Shervashidzeet al\.,[2011](https://arxiv.org/html/2606.29180#bib.bib36)\)implemented with GraKeL\(Siglidiset al\.,[2020](https://arxiv.org/html/2606.29180#bib.bib49)\)to compare the graphs based on node and edge label distributions\.
- •Transductive KG Embedding\-based Similarity\.We used TransE\(Bordeset al\.,[2013](https://arxiv.org/html/2606.29180#bib.bib37)\), DistMult\(Yanget al\.,[2015](https://arxiv.org/html/2606.29180#bib.bib38)\), ComplEx\(Trouillonet al\.,[2016](https://arxiv.org/html/2606.29180#bib.bib39)\), and RotatE\(Sunet al\.,[2019](https://arxiv.org/html/2606.29180#bib.bib40)\)trained on the combined graphs of all the compared graphs\. For ComplEx and RotatE, we use the real\-valued entity embedding vectors returned by the PyG implementation when computing cosine similarity\. We use an embedding dimension of 32, a margin of 2, a batch size of 128, and a learning rate of 0\.01 with early stopping\. We used initial embeddings in their original papers—uniform initialization for TransE and RotatE, and Xavier\-uniform initialization for DistMult and ComplEx\. The threshold forEmbPairSimis determined via grid search and set tot=0\.8t=0\.8\.
- •Inductive KG Embedding\-based Similarity\.We use INGRAM\(Leeet al\.,[2023](https://arxiv.org/html/2606.29180#bib.bib41)\)as an inductive KGE model\. We adopt the pretrained model trained on NELL\-995 provided by the official code of\(Leeet al\.,[2023](https://arxiv.org/html/2606.29180#bib.bib41)\)with an embedding dimension of 32\. The threshold forEmbPairSimis selected by grid search and set tot=0\.95t=0\.95\. Unlike transductive models, INGRAM derives context solely from the structural patterns and does not reference individual entities encountered during training\. Consequently, its performance indicates how much contextual information can be captured using structural patterns alone\.

Table 1\.Overall performance \(Hits@5,MRR, andNDCG\) on CC News and WikiText for text\-, kernel\-, and KGE\-based methods underSynonym,Context, andDIPPERparaphrasing strengths\.Seenmeans that the method directly accesses \(or was trained with\) the evaluation KG triples\. For transductive KGE model, we use RotatE and for inductive KGE model, we use INGRAM\.Emb sizedenotes the total parameter footprint used during similarity computation, calculated as*\(number of stored embedding vectors\)×\\times\(embedding dimension\)*across all original KGs in the corresponding dataset\. Since VH Kernel and WL Kernel do not generate embedding vectors, we mark theirEmb sizewith “N/A”\. In theSeencolumn,○\\bigcircindicates that the method directly accesses or is trained on the evaluation KG triples, whereas×\\timesindicates that the evaluation KG triples are unseen during training\.MetricMethodSeenCC NewsWikiTextEmbsizeSynonymContextDIPPEREmbsizeSynonymContextDIPPER3060306060/060/203060306060/060/20Hits@5VH Kernel○\\ocircleN/A0\.9700\.9650\.9550\.9500\.8500\.870N/A0\.9900\.9600\.9750\.9550\.9350\.965WL Kernel○\\ocircleN/A0\.9200\.9300\.9300\.9000\.8050\.815N/A0\.9800\.9650\.9600\.9400\.9050\.945SBERT×\\times422,4000\.9700\.9650\.9600\.9650\.9200\.945495,3600\.9550\.9550\.9850\.9650\.9400\.940EmbPairSim \(RotatE\)○\\ocircle188,9280\.9750\.9800\.9700\.9700\.8750\.915779,0080\.9950\.9800\.9900\.9700\.9550\.975EmbPairSim \(INGRAM\)×\\times188,9280\.9750\.9800\.9650\.9700\.8750\.915779,0080\.9950\.9800\.9900\.9700\.9550\.975AvgEmbSim \(RotatE\)○\\ocircle17,6000\.8000\.7100\.8000\.6950\.5100\.49520,6400\.7500\.4000\.7500\.5850\.7500\.357AvgEmbSim \(INGRAM\)×\\times17,6000\.7900\.7050\.7450\.5850\.4100\.44520,6400\.6750\.5650\.7550\.6700\.4000\.365MRRVH Kernel○\\ocircleN/A0\.9420\.9400\.9260\.8900\.7860\.804N/A0\.9700\.9040\.9470\.9130\.8580\.867WL Kernel○\\ocircleN/A0\.8810\.8760\.8850\.8210\.7170\.729N/A0\.9600\.9140\.9310\.9020\.8120\.869SBERT×\\times422,4000\.9250\.9110\.9200\.9240\.8400\.853495,3600\.9350\.9090\.9630\.9240\.9090\.901EmbPairSim \(RotatE\)○\\ocircle188,9280\.9420\.9530\.9280\.9280\.8300\.834779,0080\.9880\.9520\.9750\.9450\.8870\.897EmbPairSim \(INGRAM\)×\\times188,9280\.9420\.9460\.9280\.8890\.8270\.833779,0080\.9880\.9450\.9730\.9450\.8790\.889AvgEmbSim \(RotatE\)○\\ocircle17,6000\.7510\.6550\.7390\.6250\.4380\.43720,6400\.6980\.3140\.7480\.5430\.4920\.314AvgEmbSim \(INGRAM\)×\\times17,6000\.7310\.6500\.7030\.5260\.3620\.38320,6400\.6200\.4610\.6670\.6010\.3450\.300NDCGVH Kernel○\\ocircleN/A0\.9530\.9520\.9400\.9120\.8240\.835N/A0\.9780\.9230\.9600\.9300\.8860\.896WL Kernel○\\ocircleN/A0\.9050\.9030\.9080\.8580\.7710\.777N/A0\.9690\.9300\.9460\.9210\.8500\.895SBERT×\\times422,4000\.9430\.9310\.9380\.9420\.8760\.887495,3600\.9490\.9280\.9710\.9410\.9280\.923EmbPairSim \(RotatE\)○\\ocircle188,9280\.9540\.9630\.9430\.9490\.8610\.861779,0080\.9910\.9600\.9810\.9540\.9070\.918EmbPairSim \(INGRAM\)×\\times188,9280\.9540\.9570\.9430\.9130\.8570\.859779,0080\.9910\.9540\.9800\.9540\.9000\.912AvgEmbSim \(RotatE\)○\\ocircle17,6000\.7940\.7080\.7840\.6860\.5190\.51820,6400\.7480\.4100\.8350\.6060\.5660\.398AvgEmbSim \(INGRAM\)×\\times17,6000\.7820\.7150\.7590\.6110\.4690\.48320,6400\.6890\.5590\.7290\.6720\.4510\.411

### 6\.2\.Results

#### 6\.2\.1\.Overall Performance Comparisons

The comparative results for the models outlined in Section[6\.1\.4](https://arxiv.org/html/2606.29180#S6.SS1.SSS4)are presented in Table[1](https://arxiv.org/html/2606.29180#S6.T1)\. As detailed in Section[6\.2\.2](https://arxiv.org/html/2606.29180#S6.SS2.SSS2), RotatE achieves the best performance among all transductive embedding models inEmbPairSimandAvgEmbSim\. TheEmb sizecolumn confirms that low\-dimensional KG embeddings are more parameter\-efficient than language\-model ones\. On the other hand,EmbPairSim’s pairwise entity matching scales poorly on entity\-dense sets like WikiText, whereasAvgEmbSimuses a single graph embedding and is therefore much more efficient than SBERT orEmbPairSim\.

##### The Role of Semantic Information in Enhancing Similarity Detection

EmbPairSimoutperforms both graph kernels, highlighting the advantage of semantic embeddings over purely statistical structural methods\. This suggests that incorporating semantic information significantly improves graph similarity detection\.

##### Comparative Performance ofEmbPairSimand SBERT

EmbPairSimachieves the highest overall performance, even surpassing SBERT in most cases\. This shows that KG embeddings capture semantic meaning more effectively than text\-based embeddings\. However, under theDIPPERparaphrasing setting, SBERT performs better, indicating a limitation ofEmbPairSimwhen handling substantial lexical and syntactic changes\.

##### Effectiveness of Structural Information in Inductive Models

EmbPairSimandAvgEmbSimperform similarly on both RotatE and INGRAM\. Despite INGRAM’s fully inductive setting, its performance is comparable to transductive models, suggesting that structural information is sufficient for capturing semantic meaning in KGs\. However, although INGRAM achieves almost the sameEmbPairSimperformance as RotatE, it requires a substantially higher similarity threshold \(0\.95 versus 0\.80 for RotatE\), suggesting that INGRAM produces entity embeddings with generally higher cosine similarities and less dispersion in the embedding space than RotatE\.

##### Performance Gap between Vertex Histogram and Weisfeiler\-Lehman Kernels

The Vertex Histogram \(VH\) kernel slightly outperforms the Weisfeiler\-Lehman \(WL\) Kernel\. This appears to stem from WL’s relabeling stage, which merges entity and relation labels, thereby blurring their distinction\. Supporting this, we observed that WL using either the Vertex or Edge Histogram as its base yields identical results\. Similarly,EmbPairSim–based solely on entity embeddings–outperforms models that mix entity and relation information, highlighting the importance of preserving entity\-level signals in semantic similarity tasks\.

##### Limitations under Structural Paraphrasing

WhileEmbPairSimandAvgEmbSimgenerally outperform text\-based baselines, they show clear limitations when faced with substantial lexical and structural variations\. The most notable performance drop is observed with theDIPPERParaphraser, which introduces both large\-scale vocabulary changes and global sentence reordering\. Unlike the more localized word substitutions in theSynonymandContextsettings, these changes disrupt surface patterns while preserving underlying semantics\. This degradation stems from the fact that bothEmbPairSimandAvgEmbSimrepresent KGs through local statistics without capturing the graph’s relational structure or topology at a higher level\. Although KGE\-based methods consistently outperform SBERT in overall accuracy, their reliance on simple aggregation limits their ability to detect semantic alignment when meaning is preserved but surface forms are rearranged\. Modeling higher\-order relational structure and graph\-level semantics may therefore provide a more robust way to handle substantial paraphrastic transformations\. As a result, they remain vulnerable to paraphrastic variations that obscure local signals while maintaining global intent\.

![Refer to caption](https://arxiv.org/html/2606.29180v1/analysis/kge_model/figure.png)Figure 2\.NDCG scores ofEmbPairSimandAvgEmbSimonWikiTextusing different transductive KGE models across all modification settings\. For each setting, the highest NDCG score is marked with a black outline and a star\.

#### 6\.2\.2\.Selecting KGE Models for EmbPairSim and AvgEmbSim\.

We evaluate four transductive KG embedding models—TransE, DistMult, ComplEx, and RotatE—as back\-ends forEmbPairSimandAvgEmbSim, and summarize their NDCG performance across different paraphrasing settings in Figure[2](https://arxiv.org/html/2606.29180#S6.F2)\. RotatE consistently delivers the highest scores, plausibly because its representation of relations as rotations in complex space aligns with the cosine similarity intrinsic to both metrics, retaining angular semantics\. TransE performs worst; its purely translational assumption fails to provide the fine\-grained relational separation required for reliable similarity estimation\. DistMult and ComplEx behave similarly underEmbPairSimdue to their shared bilinear form, yet ComplEx slightly outperforms DistMult inAvgEmbSim, reflecting its capacity to model asymmetric relations\. Overall, models with multiplicative or angular inductive biases \(RotatE, ComplEx\) produce more effective embeddings for both pairwise and averaged similarity calculations\.

Table 2\.Ablation study onAvgEmbSimusing the RotatE model on WikiText\-2, comparing setting with \(w/ freq\) and without \(w/o freq\) entity frequency information\.MetricOptionSynonymContextDIPPER3060306060/060/20Hits@5w/ freq0\.7500\.4000\.7500\.5850\.7500\.357w/o freq0\.6750\.5950\.6550\.5600\.3850\.320MRRw/ freq0\.6980\.3140\.7480\.5430\.4920\.314w/o freq0\.6000\.5240\.6090\.4700\.3290\.268NDCGw/ freq0\.7480\.4100\.8350\.6060\.5660\.398w/o freq0\.6660\.5970\.6720\.5560\.4290\.378

Table 3\.Ablation study onEmbPairSimusing the INGRAM model on WikiText\-2, comparing setting with \(w/ cent\) and without \(w/o cent\) mean centering\.MetricOptionSynonymContextDIPPER3060306060/060/20Hits@5w/ cent0\.9950\.9800\.9900\.9700\.9550\.975w/o cent0\.0000\.0000\.0000\.0000\.0000\.000MRRw/ cent0\.9880\.9450\.9730\.9450\.8790\.889w/o cent0\.0020\.0020\.0020\.0020\.0020\.002NDCGw/ cent0\.9910\.9540\.9800\.9540\.9000\.912w/o cent0\.1080\.1080\.1090\.1080\.1080\.109

#### 6\.2\.3\.Ablation Study

##### Frequency of Elements Influences the Semantic Representation of a Knowledge Graph

As part of the ablation study, we weight each embedding by its occurrence frequency instead of computing a simple average\. Table[2](https://arxiv.org/html/2606.29180#S6.T2)shows that this frequency\-weighted approach generally improves performance\. These findings suggest that the frequency of entities plays a crucial role in capturing the overall semantic information of a KG, similar to how word frequency aids in understanding text documents\.

##### Mean\-Centering Enhances Entity\-Level Similarity

Overall, mean\-centering has little impact whenEmbPairSimemploys transductive KGE models\. However, Table[3](https://arxiv.org/html/2606.29180#S6.T3)shows that without mean\-centering, the INGRAM\-basedEmbPairSimfails to distinguish graphs\. This suggests that INGRAM produces entity vectors that are compressed into a narrow subspace\. This behavior may arise because INGRAM is pretrained on a single large KG\. Such pretraining captures structural patterns in the source graph, but when applied to the much smaller KGs in our experiments, it can map many entities into a narrow region of the embedding space\. Mean\-centering alleviates this collapse by removing the global bias in the embedding space, making relative differences between entity vectors more visible for cosine\-based matching\.

Table 4\.Comparison ofEmbPairSim\(pair\) andAvgEmbSim\(avg\) on WikiText\-2 \(RotatE embeddings\)\.w/o rel: entity\-only;w/ rel: entity \+ relation embeddings\.MetricOptionMethodSynonymContextDipper3060306060/060/20Hits@5w/o relpair0\.9950\.9800\.9900\.9700\.9550\.975w/ relpair0\.9950\.9550\.9900\.9550\.9100\.925MRRw/o relpair0\.9880\.9520\.9750\.9450\.8870\.897w/ relpair0\.9880\.9210\.9730\.9050\.8110\.816NDCGw/o relpair0\.9910\.9600\.9810\.9540\.9070\.918w/ relpair0\.9910\.9360\.9800\.9240\.8500\.856Hits@5w/o relavg0\.7500\.4000\.7500\.5850\.7500\.357w/ relavg0\.2500\.1800\.3300\.1900\.1450\.095MRRw/o relavg0\.6980\.3140\.7480\.5430\.4920\.314w/ relavg0\.2290\.1700\.2820\.1800\.1220\.096NDCGw/o relavg0\.7480\.4100\.8350\.6060\.5660\.398w/ relavg0\.3450\.2900\.3920\.2980\.2470\.219

#### 6\.2\.4\.Relation Embedding Considerations\.

Extending from prior experiments based solely on entity embeddings, we observe that including relation embeddings degrades similarity performance rather than improving it\. ForEmbPairSim, we compute pairwise cosine similarities between relation embeddings in the same way as entities \(Eq\. \([6](https://arxiv.org/html/2606.29180#S5.E6)\)\), and include the result in the final score \(Eq\. \([8](https://arxiv.org/html/2606.29180#S5.E8)\)\)\. ForAvgEmbSim, we calculate the frequency\-weighted mean vector for relations \(Eq\. \([9](https://arxiv.org/html/2606.29180#S5.E9)\)\) and add its cosine similarity to that of the entity vectors \(Eq\. \([10](https://arxiv.org/html/2606.29180#S5.E10)\)\)\. As shown in Table[4](https://arxiv.org/html/2606.29180#S6.T4), incorporating relation embeddings consistently degraded performance\. This suggests that relation vectors may introduce noise rather than helpful signals, likely because semantic information from relations is already implicitly captured during entity embedding training\.

Table 5\.EmbPairSim performance on WikiText\-2 using Word2Vec and FastText embeddings\. In thePretrainedcolumn,○\\bigcircindicates that externally pretrained embeddings are used, whereas×\\timesindicates that the embeddings are trained only on the WikiText\-2 corpus\.MetricModelPretrainedSynonymContextDipper3060306060/060/20Hits@5FastText×\\times0\.9900\.9400\.9850\.9550\.9150\.955Word2Vec×\\times0\.6800\.4450\.6350\.4950\.3000\.285RotatE×\\times0\.9950\.9800\.9900\.9700\.9550\.975MRRFastText×\\times0\.9720\.8650\.9570\.9030\.7780\.793Word2Vec×\\times0\.3990\.2580\.3750\.3440\.1700\.186RotatE×\\times0\.9880\.9520\.9750\.9450\.8870\.897NDCGFastText×\\times0\.9790\.8930\.9680\.9220\.8230\.839Word2Vec×\\times0\.5310\.3990\.5100\.4770\.3210\.334RotatE×\\times0\.9910\.9600\.9810\.9540\.9070\.918Hits@5FastText○\\ocircle0\.9950\.9750\.9750\.9650\.9500\.965Word2Vec○\\ocircle0\.7550\.5300\.6950\.5700\.5750\.590INGRAM○\\ocircle0\.9950\.9800\.9900\.9700\.9550\.975MRRFastText○\\ocircle0\.9740\.9250\.9590\.9210\.8470\.870Word2Vec○\\ocircle0\.5920\.3810\.5230\.4500\.4340\.413INGRAM○\\ocircle0\.9880\.9450\.9730\.9450\.8790\.889NDCGFastText○\\ocircle0\.9800\.9400\.9680\.9360\.8760\.897Word2Vec○\\ocircle0\.6800\.5000\.6260\.5570\.5410\.528INGRAM○\\ocircle0\.9910\.9540\.9800\.9540\.9000\.912

#### 6\.2\.5\.Comparing KG Embeddings with Word Embeddings\.

Because KG embeddings act as distributional representations, we can likewise applyEmbPairSimby representing entities with word embeddings\. Table[5](https://arxiv.org/html/2606.29180#S6.T5)contrastsEmbPairSimwhen entities are represented by KG embeddings versus word embeddings\. The results show that KGE\-based embeddings are more effective to capture KG\-to\-KG semantic similarity than word embeddings\. This suggests that explicitly encoding entities through relational structure is more suitable for comparing KGs than relying on word embeddings learned from sequential textual contexts\. A likely reason is that KGE directly exploits graph\-structured relations among entities, whereas word embeddings capture relational signals indirectly from sequential co\-occurrence\.

Table 6\.Runtime comparison of graph\-to\-graph similarity methods\.MethodRuntime \(s\)VH Kernel1\.569WL Kernel3\.770SBERT12\.374EmbPairSim \(RotatE\)0\.381AvgEmbSim \(RotatE\)0\.228
#### 6\.2\.6\.Runtime Efficiency\.

Table[6](https://arxiv.org/html/2606.29180#S6.T6)reports the runtime of each graph\-to\-graph similarity method on WikiText\-2 under the synonym replacement setting\. Among the baselines, VH Kernel and WL Kernel require 1\.569 and 3\.770 seconds, respectively, while SBERT requires 12\.374 seconds due to sentence\-level encoding over verbalized KG triples\. In contrast, the KGE\-based scoring functions show substantially lower runtime\. EmbPairSim with RotatE completes the similarity computation in 0\.381 seconds, and AvgEmbSim with RotatE further reduces the runtime to 0\.228 seconds\. This result supports the efficiency of the proposed KGE\-based scoring functions, especially AvgEmbSim, which represents each KG with a single frequency\-weighted centroid instead of computing pairwise entity similarities\. Together with the performance results in Table[1](https://arxiv.org/html/2606.29180#S6.T1), these runtime results indicate that KGE\-based graph\-to\-graph similarity can provide an efficient alternative to text\-based and kernel\-based methods\.

## 7\.Conclusion

We presented a systematic approach to compute semantic similarity between KGs using SBERT, graph kernels, and KGE\-based methods\. To leverage KGE more effectively, we proposed two complementary functions,EmbPairSimandAvgEmbSim\. Experimental results show that KGE can capture semantic information, supporting the view that a KG can be interpreted hierarchically like a document, with entities and relations as words and triples and subgraphs forming higher\-level meaning\. Future work should explore end\-to\-end graph neural architectures that better encode structural patterns and relational dependencies for richer graph\-level semantics\.

###### Acknowledgements\.

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation \(IITP\) and the National Research Foundation of Korea \(NRF\), both funded by the Ministry of Science and ICT \(MSIT\), under Grant Nos\. RS\-2025\-24803185, RS\-2019\-II190421, and IITP\-2025\-RS\-2020\-II201821\.

## References

- Z\. Abu\-Aisheh, R\. Raveaux, J\. Ramel, and P\. Martineau \(2015\)An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems\.InICPRAM,Cited by:[§2\.2](https://arxiv.org/html/2606.29180#S2.SS2.p1.1)\.
- S\. Bird, E\. Klein, and E\. Loper \(2009\)Natural language processing with python\.Cited by:[1st item](https://arxiv.org/html/2606.29180#S6.I1.i1.p1.7)\.
- A\. Bordes, N\. Usunier, A\. Garcia\-Duran, J\. Weston, and O\. Yakhnenko \(2013\)Translating embeddings for modeling multi\-relational data\.InNIPS,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.29180#S2.SS3.p1.1),[3rd item](https://arxiv.org/html/2606.29180#S6.I1.i3.p1.1)\.
- H\. Chase and the LangChain contributors \(2025\)LangChain: a framework for building applications with large language models \(0\.3\.26\)\.Cited by:[§6\.1\.1](https://arxiv.org/html/2606.29180#S6.SS1.SSS1.p1.6)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: pre\-training of deep bidirectional transformers for language understanding\.InNAACL,Cited by:[§6\.1\.2](https://arxiv.org/html/2606.29180#S6.SS1.SSS2.p1.4)\.
- M\. Fan, Q\. Zhou, T\. F\. Zheng, and R\. Grishman \(2017\)Distributed representation learning for knowledge graphs with entity descriptions\.Pattern Recognition Letters\.Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1)\.
- E\. Filtz \(2017\)Building and processing a knowledge\-graph for legal data\.InThe Semantic Web,Cited by:[Appendix A](https://arxiv.org/html/2606.29180#A1.p1.1),[§1](https://arxiv.org/html/2606.29180#S1.p1.4)\.
- S\. Guan, X\. Cheng, L\. Bai, F\. Zhang, Z\. Li, Y\. Zeng, X\. Jin, and J\. Guo \(2023\)What is event knowledge graph: a survey\.IEEE Transactions on Knowledge and Data Engineering\.Cited by:[Appendix A](https://arxiv.org/html/2606.29180#A1.p1.1),[§1](https://arxiv.org/html/2606.29180#S1.p1.4)\.
- M\. Y\. Jaradeh, A\. Oelen, K\. E\. Farfar, M\. Prinz, J\. D’Souza, G\. Kismihók, M\. Stocker, and S\. Auer \(2019\)Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge\.InK\-CAP,Cited by:[Appendix A](https://arxiv.org/html/2606.29180#A1.p1.1),[§1](https://arxiv.org/html/2606.29180#S1.p1.4)\.
- A\. Joulin, E\. Grave, P\. Bojanowski, and T\. Mikolov \(2017\)Bag of tricks for efficient text classification\.InEACL,Cited by:[1st item](https://arxiv.org/html/2606.29180#S6.I1.i1.p1.7)\.
- K\. Krishna, Y\. Song, M\. Iyyer, R\. Barzilay, and D\. Khashabi \(2023\)Paraphrasing evades detectors of ai\-generated text, but retrieval is an effective defense\.InNeurIPS,Cited by:[§6\.1\.2](https://arxiv.org/html/2606.29180#S6.SS1.SSS2.p1.4)\.
- R\. Krishna, Y\. Zhu, O\. Groth, J\. Johnson, K\. Hata, J\. Kravitz, S\. Chen, Y\. Kalantidis, L\. Li, D\. A\. Shamma, M\. S\. Bernstein, and F\. Li \(2017\)Visual genome: connecting language and vision using crowdsourced dense image annotations\.IJCV\.Cited by:[Appendix A](https://arxiv.org/html/2606.29180#A1.p1.1),[§1](https://arxiv.org/html/2606.29180#S1.p1.4)\.
- J\. Lee, C\. Chung, and J\. J\. Whang \(2023\)InGram: inductive knowledge graph embedding via relation graphs\.InICML,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.29180#S2.SS3.p1.1),[4th item](https://arxiv.org/html/2606.29180#S6.I1.i4.p1.1)\.
- J\. Mackenzie, R\. Benham, M\. Petri, J\. R\. Trippas, J\. S\. Culpepper, and A\. Moffat \(2020\)CC\-news\-en: a large english news corpus\.InCIKM,Cited by:[§6\.1\.1](https://arxiv.org/html/2606.29180#S6.SS1.SSS1.p1.6)\.
- S\. Merity, C\. Xiong, J\. Bradbury, and R\. Socher \(2017\)Pointer sentinel mixture models\.InICLR,Cited by:[§6\.1\.1](https://arxiv.org/html/2606.29180#S6.SS1.SSS1.p1.6)\.
- T\. Mikolov, K\. Chen, G\. Corrado, and J\. Dean \(2013a\)Efficient estimation of word representations in vector space\.InICLR Workshop,Cited by:[§2\.1](https://arxiv.org/html/2606.29180#S2.SS1.p1.1),[1st item](https://arxiv.org/html/2606.29180#S6.I1.i1.p1.7)\.
- T\. Mikolov, I\. Sutskever, K\. Chen, G\. Corrado, and J\. Dean \(2013b\)Distributed representations of words and phrases and their compositionality\.InNIPS,Cited by:[1st item](https://arxiv.org/html/2606.29180#S6.I1.i1.p1.7)\.
- G\. A\. Miller \(1995\)WordNet: a lexical database for english\.Communications of the ACM\.Cited by:[§6\.1\.2](https://arxiv.org/html/2606.29180#S6.SS1.SSS2.p1.4)\.
- OpenAI \(2023\)GPT\-3\.5\-turbo api\.Cited by:[§4](https://arxiv.org/html/2606.29180#S4.SS0.SSS0.Px2.p2.9),[§6\.1\.1](https://arxiv.org/html/2606.29180#S6.SS1.SSS1.p1.6)\.
- N\. Reimers and I\. Gurevych \(2019\)Sentence\-BERT: sentence embeddings using Siamese BERT\-networks\.InEMNLP\-IJCNLP,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p9.1),[§2\.1](https://arxiv.org/html/2606.29180#S2.SS1.p1.1)\.
- N\. Reimers and I\. Gurevych \(2021\)all\-mpnet\-base\-v2: a sentence\-bert model\.Note:[https://huggingface\.co/sentence\-transformers/all\-mpnet\-base\-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)Cited by:[1st item](https://arxiv.org/html/2606.29180#S6.I1.i1.p1.7)\.
- N\. Shervashidze, P\. Schweitzer, E\. J\. van Leeuwen, K\. Mehlhorn, and K\. M\. Borgwardt \(2011\)Weisfeiler\-lehman graph kernels\.Journal of Machine Learning Research\.Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p9.1),[§2\.2](https://arxiv.org/html/2606.29180#S2.SS2.p1.1),[§3\.3](https://arxiv.org/html/2606.29180#S3.SS3.p1.3),[2nd item](https://arxiv.org/html/2606.29180#S6.I1.i2.p1.1)\.
- G\. Siglidis, G\. Nikolentzos, Y\. Limnios, C\. Giatsidis, and M\. Vazirgiannis \(2020\)GraKeL: a graph kernel library in python\.Journal of Machine Learning Research\.Cited by:[2nd item](https://arxiv.org/html/2606.29180#S6.I1.i2.p1.1)\.
- M\. Sugiyama and K\. Borgwardt \(2015\)Halting in random walk kernels\.InNeurIPS,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p9.1),[§2\.2](https://arxiv.org/html/2606.29180#S2.SS2.p1.1),[§3\.3](https://arxiv.org/html/2606.29180#S3.SS3.p1.3),[2nd item](https://arxiv.org/html/2606.29180#S6.I1.i2.p1.1)\.
- Z\. Sun, Z\. Deng, J\. Nie, and J\. Tang \(2019\)RotatE: knowledge graph embedding by relational rotation in complex space\.InICLR,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.29180#S2.SS3.p1.1),[3rd item](https://arxiv.org/html/2606.29180#S6.I1.i3.p1.1)\.
- T\. Trouillon, J\. Welbl, S\. Riedel, É\. Gaussier, and G\. Bouchard \(2016\)Complex embeddings for simple link prediction\.InICML,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.29180#S2.SS3.p1.1),[3rd item](https://arxiv.org/html/2606.29180#S6.I1.i3.p1.1)\.
- J\. R\. Ullmann \(1976\)An algorithm for subgraph isomorphism\.Journal of the ACM\.Cited by:[§2\.2](https://arxiv.org/html/2606.29180#S2.SS2.p1.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.InNeurips,External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1)\.
- D\. Vrandečić and M\. Krötzsch \(2014\)Wikidata: a free collaborative knowledgebase\.Communications of the ACM57\(10\),pp\. 78–85\.Cited by:[Appendix A](https://arxiv.org/html/2606.29180#A1.p1.1),[§1](https://arxiv.org/html/2606.29180#S1.p1.4)\.
- X\. Wang, T\. Gao, Z\. Zhu, Z\. Zhang, Z\. Liu, J\. Li, and J\. Tang \(2021\)KEPLER: a unified model for knowledge embedding and pre\-trained language representation\.InACL,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1)\.
- B\. Yang, W\. Yih, X\. He, J\. Gao, and L\. Deng \(2015\)Embedding entities and relations for learning and inference in knowledge bases\.InICLR,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.29180#S2.SS3.p1.1),[3rd item](https://arxiv.org/html/2606.29180#S6.I1.i3.p1.1)\.
- L\. Yao, C\. Mao, and Y\. Luo \(2019\)KG\-bert: bert for knowledge graph completion\.InEMNLP,Cited by:[§1](https://arxiv.org/html/2606.29180#S1.p3.1)\.
- Y\. Zhang, X\. Sui, F\. Pan, K\. Yu, K\. Li, S\. Tian, and J\. Zhang \(2025\)A comprehensive large scale biomedical knowledge graph for ai powered data driven biomedical research\.Nature Machine Intelligence\.Cited by:[Appendix A](https://arxiv.org/html/2606.29180#A1.p1.1),[§1](https://arxiv.org/html/2606.29180#S1.p1.4)\.

## Appendix AExamples of KG Information at Different Granularities

In scholarly KGs like ORKG\(Jaradehet al\.,[2019](https://arxiv.org/html/2606.29180#bib.bib23)\), contribution\-centered subgraphs represent contributions and are used to compare related literature, and the entire KG represents a scholarly knowledge base and is used to support research comparison and thematic review across papers or research areas, while entities and relations represent scholarly units and their connections and triples encode structured scholarly statements\. In event\-centric KGs like EventKG\(Guanet al\.,[2023](https://arxiv.org/html/2606.29180#bib.bib22)\), event\-centered subgraphs represent event contexts and are used to support event exploration and timeline generation, and the entire KG represents a temporal event knowledge base and is used for event\-centric analysis, while entities and relations represent events, participants, locations, times, and temporal or semantic connections and triples encode individual event facts\. In scene graphs\(Krishnaet al\.,[2017](https://arxiv.org/html/2606.29180#bib.bib25)\), region\-level subgraphs represent image regions and are used for region\-level visual understanding, and the entire KG represents image\-level visual content and is used for image\-level visual understanding, while entities and relations represent visual objects, attributes, and interactions and triples encode local visual relationships\. In encyclopedic KGs like Wikidata\(Vrandečić and Krötzsch,[2014](https://arxiv.org/html/2606.29180#bib.bib27)\), item\- or topic\-centered subgraphs represent item\-level knowledge and are used to retrieve item information, statements, and provenance, and the entire KG represents an encyclopedic knowledge base and is used for structured search over entities and attributes across domains, while entities and relations represent encyclopedic units and their properties and triples correspond to item\-property\-value statements\. In biomedical KGs like BioKG\(Zhanget al\.,[2025](https://arxiv.org/html/2606.29180#bib.bib26)\), disease\-, drug\-, or pathway\-centered subgraphs represent biomedical contexts and are used to trace relations among diseases, drugs, genes, and pathways for biomedical knowledge discovery, and the entire KG represents a biomedical knowledge base and is used for causal inference, drug repurposing, and drug target identification, while entities and relations represent biomedical units and their associations and triples encode individual biomedical facts\. In legal KGs like LegalKG\(Filtz,[2017](https://arxiv.org/html/2606.29180#bib.bib24)\), case\- or regulation\-centered subgraphs represent legal contexts and are used to trace dependencies between legal provisions and judicial decisions, and the entire KG represents a legal knowledge base and is used to search, connect, and process legal norms and court decisions, while entities and relations represent legal units and their dependencies and triples encode individual legal statements\.

Similar Articles

Knowledge Graph-Enhanced Zero-Shot Topic Classification: A Multi-Strategy Comparative Study

arXiv cs.CL

This paper proposes a zero-shot multi-label topic classification framework enhanced with per-article knowledge graphs, comparing four base variants and their graph-augmented counterparts across fifteen LLMs and eight datasets. The study finds that keyword-enhanced classification performs best, and graph augmentation improves small models but degrades performance in larger ones.