GATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotation
Summary
This paper introduces GATHER, a convergence-centric retrieval method for zero-shot cell-type annotation using knowledge graphs, which improves accuracy and reduces LLM costs compared to existing KG-RAG baselines.
View Cached Full Text
Cached at: 05/08/26, 07:54 AM
# GATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotation
Source: [https://arxiv.org/html/2605.06403](https://arxiv.org/html/2605.06403)
\(2026\)
###### Abstract\.
Zero\-shot single\-cell cell\-type annotation aims to determine a cell’s type from a given set of expressed genes without any training\. Existing knowledge\-graph\-based RAG approaches retrieve evidence by expanding from source entities and relying on iterative LLM reasoning\. However, in this setting each query contains tens to hundreds of genes, where no single gene is decisive and the label emerges only from their collective co\-occurrence\. Such hyper\-entity queries fundamentally challenge local, entity\-wise exploration strategies, which reason from individual genes, leading to poor scalability and substantial LLM cost\. We proposeGATHER\(Graph\-Aware Traversal with Hyper\-Entity Retrieval\), a convergence\-centric retriever tailored to hyper\-entity queries\. It performs global multi\-source graph traversal and identifies topological convergence points—nodes jointly reachable from many input genes\. These convergence nodes act as high\-information hyper\-entities that capture entity synergy\. By incorporating node\- and path\-importance scoring, GATHER selects informative evidence entirely without LLM involvement during retrieval\. Instantiated on a self\-constructed cell\-centric biological knowledge graph \(VCKG\), GATHER outperforms strong KG\-RAG baselines \(ToG, ToG\-2, RoG, PoG\) on two datasets \(Immune and Lung\), achieving the highest exact\-match accuracy \(27\.45% and 59\.64%\) with only a single LLM call per sample, compared to 2–61 calls for KG\-RAG baselines\. Our results demonstrate that convergence nodes compress multi\-entity signals into compact, high\-information evidence that conveys more per item than multi\-hop paths, providing an efficient global alternative to local entity\-wise reasoning\.
Hyper\-Entity Retrieval; Knowledge Graph RAG; Convergence\-Centric Retrieval; Cell Type Annotation
††journalyear:2026††copyright:cc††conference:Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 20–24, 2026; Melbourne, VIC, Australia††booktitle:Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval \(SIGIR ’26\), July 20–24, 2026, Melbourne, VIC, Australia††isbn:979\-8\-4007\-2599\-9/2026/07††doi:10\.1145/3805712\.3809935††ccs:Information systems Retrieval models and ranking††ccs:Computing methodologies Knowledge representation and reasoning††ccs:Applied computing Bioinformatics## 1\.Introduction
Single\-cell cell\-type annotation\(Zhang et al\.,[2019](https://arxiv.org/html/2605.06403#bib.bib23); Aran et al\.,[2019](https://arxiv.org/html/2605.06403#bib.bib3); Pasquini et al\.,[2021](https://arxiv.org/html/2605.06403#bib.bib17)\)aims to assign a cell type based on a cell’s gene expression profile and is fundamental to computational biology, enabling downstream analyses such as cell\-type discovery and disease mechanism study\. Given a cell’s gene expression profile, the task relies on the*joint expression pattern*of many genes rather than any single marker\. Thus, the prediction signal emerges from the*global interaction*among tens to hundreds of genes\.
Supervised foundation models such as scGPT\(Cui et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib8)\), scBERT\(Yang et al\.,[2022](https://arxiv.org/html/2605.06403#bib.bib22)\), and Geneformer\(Theodoris et al\.,[2023](https://arxiv.org/html/2605.06403#bib.bib21)\)achieve strong accuracy but operate as black boxes, limiting interpretability\. In training\-free settings, large language models \(LLMs\)\(Brown et al\.,[2020](https://arxiv.org/html/2605.06403#bib.bib5); Achiam et al\.,[2023](https://arxiv.org/html/2605.06403#bib.bib2)\)offer explainable reasoning but suffer from imprecise domain knowledge\(Hou and Ji,[2024](https://arxiv.org/html/2605.06403#bib.bib11); Zhao et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib24)\)\. Retrieval\-Augmented Generation \(RAG\)\(Lewis et al\.,[2020](https://arxiv.org/html/2605.06403#bib.bib13); Fan et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib10)\)has therefore emerged as a promising paradigm to ground LLMs with structured knowledge\.
However, applying RAG to cell\-type annotation introduces a key challenge: each query consists of tens to hundreds of genes, and effectively leveraging external knowledge graphs over such a large set of entities becomes non\-trivial\. The central question is how to integrate structured knowledge while jointly considering all source entities\. Existing knowledge graph\-based RAG methods\(Pan et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib16); Zhu et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib25)\), such as ToG\(Sun et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib19)\)and PoG\(Tan et al\.,[2025](https://arxiv.org/html/2605.06403#bib.bib20)\), primarily adopt a*local expansion*paradigm\. They start from each source entity independently, explore neighboring nodes, and aggregate retrieved paths as separate evidence\. While effective for few\-entity queries, this strategy becomes problematic in hyper\-entity settings\. First, independent expansion fragments the collective signal and fails to explicitly model interactions among many entities\. Second, the search and LLM\-interaction costs scale rapidly with the number of source entities, expansion breadth, and depth\.
To address these limitations, we proposeGATHER\(Graph\-AwareTraversal withHyper\-EntityRetrieval\), which shifts from local expansion to a global convergence paradigm\. Instead of reasoning from each entity separately, GATHER identifies important*convergence points*—nodes jointly reachable from many source entities—which serve as high\-information*hyper\-entities*\. These nodes naturally capture the structural interactions among genes and act as consolidated evidence\. Figure[1](https://arxiv.org/html/2605.06403#S1.F1)contrasts this convergent retrieval pattern with divergent per\-entity expansion\.
We obtain such hyper\-entities through a three\-stage process\. First, we perform multi\-source graph traversal to propagate signals from all input entities simultaneously\. Second, we rank candidate convergence nodes using rank\- and topology\-aware scores, selecting the most informative hyper\-entities\. The selected nodes are then passed to the LLM for final reasoning\.
We instantiate GATHER on a self\-constructed cell\-centric biological knowledge graph and apply it to zero\-shot cell\-type annotation\. Experiments on two datasets \(Immune and Lung\) show that GATHER achieves the highest exact\-match accuracy \(27\.45%and59\.64%\) with only one LLM call per sample, outperforming all KG\-RAG baselines while using2–61×\\timesfewer LLM calls than KG\-RAG baselines\. These results demonstrate that convergence nodes compress multi\-entity signals into compact, high\-information evidence that conveys more per item than multi\-hop paths, making convergence\-centric retrieval an effective and efficient principle for hyper\-entity reasoning\. Code is available at[https://github\.com/SUAT\-AIRI/GATHER](https://github.com/SUAT-AIRI/GATHER)\.
Figure 1\.Divergent vs\. convergent retrieval for hyper\-entity queries\.\(a\)Divergent: per\-entity LLM\-guided expansion\.\(b\)Convergent \(GATHER\): multi\-source traversal identifying topological convergence points\.A comparison between divergent per\-entity graph expansion and GATHER’s convergent multi\-source traversal for hyper\-entity queries\.
## 2\.Method
### 2\.1\.Task Formulation
We consider zero\-shot cell\-type annotation under a RAG framework\. We use zero\-shot in the training\-free sense: no model is trained or fine\-tuned on labeled cells from the evaluation datasets\. The method still relies on curated prior knowledge in VCKG and ranked gene lists, and is therefore best understood as knowledge\-driven, training\-free inference\. Given a cell’s gene expression profile, we construct a cell sentence𝒮=\{g1,…,gn\}\\mathcal\{S\}=\\\{g\_\{1\},\\ldots,g\_\{n\}\\\}following Cell2Sentence\(Rizvi et al\.,[2025](https://arxiv.org/html/2605.06403#bib.bib18)\), where genes are ranked by discriminative power\. We normalize gene symbols against VCKG Gene node symbols and synonyms and map them to canonical Gene nodes\. Uninformative housekeeping genes \(e\.g\.,RPL\*,MT\-\*\) and symbols without matching nodes are filtered, yielding the grounded gene set𝒮~\\tilde\{\\mathcal\{S\}\}\. The objective is to predict a cell typec∗∈𝒞c^\{\*\}\\in\\mathcal\{C\}, where𝒞\\mathcal\{C\}is defined by the Cell Ontology\.
In the RAG paradigm, prediction consists of two stages: \(i\) retrieving relevant knowledge from a knowledge graph, and \(ii\) LLM\-based reasoning over the retrieved evidence\.
As discussed in the Introduction, hyper\-entity queries differ fundamentally from few\-entity settings: the correct cell type emerges from the*joint support*of many genes\. Accordingly, we reformulate retrieval as a*multi\-source convergence problem*: instead of expanding from each entity independently, the goal is to identify graph nodes jointly supported by multiple genes in𝒮~\\tilde\{\\mathcal\{S\}\}\.
### 2\.2\.GATHER: Convergence\-Centric Retrieval
Building upon this reformulation, we proposeGATHER, a retrieval algorithm designed for hyper\-entity queries, as shown in Figure[2](https://arxiv.org/html/2605.06403#S2.F2)\. Rather than performing local expansion from each gene, GATHER identifies*topological convergence points*— nodes that receive strong structural support from many source genes\. These nodes act as*hyper\-entities*, serving as consolidated evidence that captures global interactions among genes\. Here, a hyper\-entity is not a new biological entity type; it denotes a retrieved graph node whose relevance is defined by joint support from a set of source entities rather than by an individual source alone\.
GATHER obtains such hyper\-entities in three compact stages: \(1\) multi\-source traversal, which propagates from grounded genes through the graph and records shared reachability patterns; \(2\) gene weighting, which combines gene rank with graph specificity; and \(3\) convergence scoring, which aggregates hop\-binned support to select the final evidence nodes\.
#### 2\.2\.1\.Stage 1: Multi\-Source Graph Traversal
For eachg∈𝒮~g\\in\\tilde\{\\mathcal\{S\}\}, we traverse the knowledge graph up tokkhops in a relation\-agnostic manner \(all edge types, both directions\)\. A semantic\-type constraint prevents consecutive nodes of identical type, avoiding degenerate chains\.
Crucially, traversals from all genes proceed simultaneously\. For each discovered target nodett\(candidate cell\-type node\), we record its hop\-binned support:
\(1\)Sh\(t\)=\{g∈𝒮~∣greachestin exactlyhhops\}\.S\_\{h\}\(t\)=\\\{g\\in\\tilde\{\\mathcal\{S\}\}\\mid g\\text\{ reaches \}t\\text\{ in exactly \}h\\text\{ hops\}\\\}\.
Nodes jointly reachable from many genes through short paths naturally emerge as candidate convergence points\.
#### 2\.2\.2\.Stage 2: Context\-Aware Gene Weighting
Stage 1 identifies which genes can support each candidate target, but raw support counts treat all genes equally\. This is undesirable because top\-ranked genes in the cell sentence are more discriminative, whereas broadly connected genes may reach many targets and provide less specific evidence\. We therefore assign each genegga combined weight with two components\.
Rank\-based importance:
\(2\)wgrank=1log2\(rank\(g\)\+2\)\.w\_\{g\}^\{\\text\{rank\}\}=\\frac\{1\}\{\\log\_\{2\}\(\\text\{rank\}\(g\)\+2\)\}\.
Graph specificity \(IDF\-style\):
\(3\)wgIDF=log\(\|𝒯\|df\(g\)\+1\),w\_\{g\}^\{\\text\{IDF\}\}=\\log\\left\(\\frac\{\|\\mathcal\{T\}\|\}\{\\text\{df\}\(g\)\}\+1\\right\),
wheredf\(g\)\\text\{df\}\(g\)is the number of candidate targets reachable fromgg, and𝒯\\mathcal\{T\}is the union of all discovered targets\.
The rank term favors salient genes in the cell sentence, whereas the IDF term downweights genes that reach many candidate targets\. Together, they make the subsequent convergence score depend on selective, high\-rank support rather than raw reachability\.
#### 2\.2\.3\.Stage 3: Topology\-Aware Convergence Scoring
We rank each candidate nodettby aggregating weighted support:
\(4\)Score\(t\)=∑h=1kαh∑g∈Sh\(t\)wgrank⋅wgIDF,\\text\{Score\}\(t\)=\\sum\_\{h=1\}^\{k\}\\alpha\_\{h\}\\sum\_\{g\\in S\_\{h\}\(t\)\}w\_\{g\}^\{\\text\{rank\}\}\\cdot w\_\{g\}^\{\\text\{IDF\}\},
whereαh\\alpha\_\{h\}are hop\-decay weights favoring shorter paths\.
This scoring function directly operationalizes the convergence principle: nodes jointly supported by many informative and specific genes through short paths receive higher scores\.
Figure 2\.The three stages of GATHER: \(1\) multi\-source graph traversal from the grounded gene set𝒮~\\tilde\{\\mathcal\{S\}\}, whereggdenotes a source gene andtta candidate target; \(2\) context\-aware gene weighting \(node size encodeswgrankw\_\{g\}^\{\\text\{rank\}\}; color depth encodeswgIDFw\_\{g\}^\{\\text\{IDF\}\}\); and \(3\) topology\-aware convergence scoring, where the top\-KKcandidates ranked byScore\(t\)\\mathrm\{Score\}\(t\)are selected as hyper\-entities\.The three algorithmic stages of GATHER: multi\-source graph traversal, context\-aware gene weighting, and topology\-aware convergence scoring\.The top\-KKconvergence nodes, together with their supporting gene coalitions, form a compressed evidence context that is passed to a single LLM call for final reasoning\.
Compared to entity\-wise local expansion, GATHER provides: \(1\)Global coordination:explicitly modeling joint structural support across genes; \(2\)Context compression:distillingNNsource genes intoK≪NK\\ll Nconvergence nodes; \(3\)LLM cost reduction:requiring zero LLM calls during retrieval\. Thus, retrieval shifts from fragmented local exploration to structured global aggregation\.
The retrieval cost is governed by the number of grounded genes, the traversal horizon, and the local graph fan\-out\. In the worst case, a naive traversal grows asO\(\|𝒮~\|dk\)O\(\|\\tilde\{\\mathcal\{S\}\}\|d^\{k\}\)for average fan\-outdd, but GATHER uses a shallow fixed horizon and ranks only candidate cell\-type targets\. The IDF term is therefore traversal\-specific:df\(g\)\\text\{df\}\(g\)is computed under the traversal horizonkkused for a given run, rather than as a global gene frequency or a per\-sample tuned quantity\.
### 2\.3\.VCKG as a Cell\-Centric Knowledge Graph for Hyper\-Entity Retrieval
GATHER requires a knowledge graph where genes serve as entry points, functional and cellular semantics enable multi\-hop paths to cell types, and cell\-type nodes are grounded in a formal ontology\. No existing public KG satisfies all three requirements: general\-purpose biomedical KGs \(e\.g\., PrimeKG\(Chandak et al\.,[2023](https://arxiv.org/html/2605.06403#bib.bib6)\)\) lack a cell\-centric schema, while domain\-specific resources \(e\.g\., CellMarker\(Zhang et al\.,[2019](https://arxiv.org/html/2605.06403#bib.bib23)\)\) are flat databases without the graph topology needed for multi\-hop convergence\.
We therefore constructVCKG, a cell\-centric biological knowledge graph that integrates 20\+ databases and 7 domain ontologies through a four\-step pipeline: \(1\)*data collection*from sources spanning genes \(NCBI Gene, HGNC, UniProt\), functions \(GO\(Ashburner et al\.,[2000](https://arxiv.org/html/2605.06403#bib.bib4)\), Reactome\), cells \(Cell Ontology, CellMarker 2\.0\(Hu et al\.,[2023](https://arxiv.org/html/2605.06403#bib.bib12)\)\), anatomy \(UBERON\), and diseases \(DO, MONDO, HPO\); \(2\)*ontology normalization*, mapping every entity to its canonical identifier to resolve synonyms and cross\-database conflicts; \(3\)*relation standardization*via the Relation Ontology; and \(4\)*graph assembly*into a Neo4j property graph\. To reduce cross\-source ambiguity, gene symbols and aliases are normalized against HGNC and NCBI Gene identifiers, protein\-level references are linked through UniProt accessions, and cell types are grounded to Cell Ontology IDs\. When multiple sources describe the same entity, we merge by canonical identifier and preserve source\-specific fields, such asfield\_sources,source, or PMID metadata, when available\. Because direct marker annotations are incomplete, VCKG supports both directIS\_MARKER\_FORevidence and indirect functional or ontological paths, allowing GATHER to exploit convergence beyond one\-hop marker lookup\.
Table 1\.Key statistics of VCKG\.The key design principle is*gene\-centric structural connectivity*: 43,000\+ gene nodes act as hubs linking to functional concepts \(GO terms, pathways\) and 2,500\+ Cell Ontology cell\-type nodes\. A gene connects to a cell type either directly viaIS\_MARKER\_FOR\(1\-hop\) or indirectly through shared annotations, e\.g\.,Gene→PARTICIPATES\_INBP←CAPABLE\_OFCellType\\texttt\{Gene\}\\xrightarrow\{\\texttt\{PARTICIPATES\\\_IN\}\}\\texttt\{BP\}\\xleftarrow\{\\texttt\{CAPABLE\\\_OF\}\}\\texttt\{CellType\}\(2\-hop\), enabling convergence signals from both direct marker evidence and indirect functional overlap\. Table[1](https://arxiv.org/html/2605.06403#S2.T1)summarizes the key statistics\.
## 3\.Experiments
### 3\.1\.Experimental Setup
##### Datasets\.
We evaluate on two single\-cell datasets whose labels adopt standardized Cell Ontology nomenclature, enabling direct alignment with VCKG’s ontology\-grounded cell type nodes: \(1\)Immune Human\(Domínguez Conde et al\.,[2022](https://arxiv.org/html/2605.06403#bib.bib9)\): a subset from the cross\-tissue immune cell atlas \(donors A29 and A31, following the Cell2Sentence\(Rizvi et al\.,[2025](https://arxiv.org/html/2605.06403#bib.bib18)\)split\), comprising 2,962 test cells across 34 fine\-grained types; \(2\)Tabula Sapiens Lung\(Consortium\* et al\.,[2022](https://arxiv.org/html/2605.06403#bib.bib7)\): the lung tissue partition from the Tabula Sapiens multi\-organ atlas, comprising 2,416 test cells across 30 types\. Each cell is converted to a cell sentence ofn=50n\{=\}50ranked genes \(source entities\), and the number of retained convergence nodes is set toK=10K\{=\}10\. In all main experiments, we set the traversal horizon tok=2k\{=\}2to balance coverage and specificity, while the GATHER formulation supports other traversal horizons\.
##### Baselines\.
We compare GATHER against representative baselines spanning direct LLM prompting and diverse KG\-based RAG paradigms\.LLMperforms direct prompting on gene lists without external knowledge\. Among KG\-based methods, existing approaches predominantly adopt a*local, entity\-wise expansion*paradigm, where each gene is explored independently and evidence is aggregated afterward\.RoG\(Luo et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib14)\)follows a template\-based strategy by predicting relation paths and performing per\-entity traversal\.ToG\(Sun et al\.,[2024](https://arxiv.org/html/2605.06403#bib.bib19)\)and its enhanced variantToG\-2\(Ma et al\.,[2025](https://arxiv.org/html/2605.06403#bib.bib15)\)perform iterative, LLM\-guided entity\-level exploration with dynamic pruning\.PoG\(Tan et al\.,[2025](https://arxiv.org/html/2605.06403#bib.bib20)\)conducts multi\-stage path\-centric search with LLM\-based refinement\. All methods use the same LLM backbone \(GPT\-4o\-mini\) and VCKG as the knowledge source for fair comparison\.
##### Metrics\.
We report exact\-match accuracy, average LLM calls per sample, and evidence quantity \(average number of evidence items retrieved per sample\)\. To account for the hierarchical nature of cell types, we additionally evaluate against the Cell Ontology DAG: Ancestor Match \(Anc\.\) credits predictions that lie on the same root\-to\-leaf path as the true label \(i\.e\., one is an ancestor or descendant of the other\)\. We focus on exact and ontology\-aware matching metrics, as they directly reflect whether retrieval succeeds in identifying the correct semantic region in the Cell Ontology, rather than smoothing errors through label averaging\.
Table 2\.Main results on cell\-type annotation\. Exact: exact\-match accuracy \(%\); Anc\.: ancestor match \(%\), where a prediction is credited if it is an ancestor or descendant of the true label in the Cell Ontology; Calls: average LLM calls per sample; Evid\.: average evidence items per sample \(convergence nodes for GATHER; relation paths for RoG; reasoning triples for ToG/ToG\-2; retrieved paths for PoG\)\. All methods use GPT\-4o\-mini and VCKG\.
### 3\.2\.Main Results
##### Overall Performance and Efficiency\.
From Table[2](https://arxiv.org/html/2605.06403#S3.T2), GATHER achie\-ves the highest exact\-match accuracy on both datasets:27\.45%on Immune \(vs\. 20\.80% for PoG\) and59\.64%on Lung \(vs\. 56\.37% for RoG\)\. Notably, these gains are obtained with only 1 LLM call per sample, whereas KG\-RAG baselines require between 2\.0 and 60\.5 LLM calls\. Although increased LLM interaction generally correlates with improved accuracy \(e\.g\., ToG\-2: 13\.3 calls→\\to18\.13% on Immune; ToG: 56\.2 calls→\\to20\.50%\), GATHER surpasses all baselines without additional LLM reasoning steps\. Beyond LLM efficiency, GATHER also achieves*evidence compression*\. Although GATHER retrieves 9\.3–10\.0 evidence items, comparable in count to baselines such as PoG \(6\.8–7\.8 paths\), the information granularity differs fundamentally: each baseline evidence item is a*multi\-hop path*consisting of several nodes and edges \(e\.g\., a 2\-hop path contains 3 nodes and 2 relations\), whereas each GATHER evidence item is a single*convergence node*that aggregates structural support from multiple source genes\. Thus, in terms of total information volume fed to the LLM, GATHER uses substantially less knowledge than path\-based methods while achieving the highest exact\-match accuracy\. This demonstrates that convergence modeling produces inherently more informative evidence per item, and that retrieval quality—not quantity—drives performance\.
##### Cross\-Dataset Analysis\.
Performance patterns differ between the two datasets\. Lung yields higher absolute accuracy for all methods \(e\.g\., LLM baseline: 54\.88% on Lung vs\. 14\.01% on Immune\), reflecting its relatively coarser and more transcriptionally distinct cell types\. Immune, containing fine\-grained T\-cell subtypes, is more sensitive to multi\-gene interactions\. Despite this increased difficulty, GATHER maintains the best exact\-match accuracy in both regimes\. On Lung, it also achieves the highest ancestor match \(60\.18%\), indicating correct placement within the ontology hierarchy\. On Immune, while the ancestor match \(33\.09%\) is below ToG \(36\.61%\), the higher exact accuracy suggests that GATHER favors precise subtype predictions over conservative coarse\-grained guesses\. These observations support that convergence modeling is particularly beneficial in fine\-grained hyper\-entity settings, where the label cannot be resolved from any single gene and must instead emerge from multi\-gene structural agreement\.
##### Ablation and Scaling Analysis\.
The path ablation further clarifies the effective signal\. When explicit traversal paths are added, accuracy decreases from 27\.45% to 26\.54% on Immune and from 59\.64% to 58\.65% on Lung\. This suggests that the LLM primarily benefits from*which genes converge on a candidate and at what distance*, rather than verbose path descriptions, reinforcing that topological convergence is the core signal\.
Gene\-scaling results \(Table[3](https://arxiv.org/html/2605.06403#S3.T3)\) show that exact\-match accuracy generally improves as the number of input genes grows\. On Lung, accuracy rises from 55\.46% \(10 genes\) to 59\.64% \(50 genes\); on Immune, it improves from 24\.85% \(10 genes\) to 27\.72% \(40 genes\) and remains comparable at 27\.45% with 50 genes\. This trend suggests that additional genes strengthen the convergence signal until it saturates rather than dilute it\. In contrast, traditional entity\-wise methods do not exhibit comparable scaling gains, as their retrieval treats each gene independently\. Finally, varying the number of retained convergence nodes \(K∈\{5,10,15\}K\\in\\\{5,10,15\\\}\) yields similar performance \(Immune: 26\.87%/27\.45%/27\.04%; Lung: 59\.48%/59\.64%/59\.81%\), withK=10K\{=\}10achieving a good balance across both datasets, indicating that convergence signals are stably concentrated among a small set of top candidates\.
Table 3\.Effect of input gene count on GATHER \(GPT\-4o\-mini\)\. Genes: number of input genes from the cell sentence; Grounded: genes successfully mapped to VCKG nodes; Exact: exact\-match accuracy \(%\)\.
### 3\.3\.Discussion and Limitations
##### Scope of Comparison\.
This work focuses on training\-free KG\-RAG retrieval rather than supervised cell\-type classification\. Non\-KG methods such as linear classifiers, PCA\-based workflows, set\-based encoders, and graph neural networks are important complementary baselines, but they typically require labeled cells, feature learning, or task\-specific training\. Comparing these paradigms under a unified data and supervision protocol is an important direction for future work\.
##### Dependence on VCKG\.
GATHER assumes that the underlying graph contains meaningful convergence targets and that cell\-type nodes are sufficiently grounded in ontology and marker evidence\. Its performance may degrade when the true cell type is absent from the graph, when marker coverage is sparse, or when input genes cannot be reliably grounded\. Applying the method to a general biomedical KG such as PrimeKG would require additional schema alignment, target typing, and cell\-type grounding, because such KGs are not organized around cell\-type retrieval\.
##### Evidence and Failure Modes\.
Convergence nodes should be interpreted as retrieval evidence, not as direct proof of a biological mechanism\. On fine\-grained immune subtypes, GATHER improves exact match, but can still make overly specific errors, as reflected by its lower ancestor match than ToG\. Future work should report top\-KKretrieval ceilings, per\-class confusion patterns, expert validation of retrieved convergence nodes, and robustness to missing genes or noisy graph edges\.
##### Parameter Sensitivity\.
Our main experiments usen=50n\{=\}50input genes and a two\-hop traversal horizon as a compact default setting\. The gene\-count analysis suggests that convergence signals strengthen once sufficient source genes are available, but the best traversal depth and relation constraints may vary across knowledge graphs and annotation granularity\. Future work should study when additional graph context improves coverage or introduces overly broad or noisy convergence targets\.
## 4\.Conclusion
We introducedGATHER, a convergence\-centric retrieval framework for hyper\-entity queries in KG\-RAG, where answers emerge from the collective support of many entities\. Instead of independent entity\-wise expansion, GATHER detects topological convergence points to model multi\-source structural synergy during retrieval\. Across two benchmarks, GATHER achieves the best exact\-match accuracy \(27\.45% on Immune and 59\.64% on Lung\) while requiring only a single LLM call per sample, reducing LLM usage by 2–61×\\timescompared to KG\-RAG baselines\. These results demonstrate that convergence nodes compress collective entity signals into compact, high\-information evidence, and that improving retrieval quality, rather than increasing evidence volume or iterative LLM reasoning, is key to effective hyper\-entity modeling\. More broadly, our findings highlight evidence informativeness as a key factor in zero\-shot biomedical reasoning with collective entity signals\.
###### Acknowledgements\.
This work was supported by the project of Shenzhen Application Research and Development Special Fund Support \(Grant No\. XLQSQ20250427092505008\), the National Key Research and Development Program of China \(Grant No\. 2024YFF0908200\), the Natural Science Foundation of Guangdong Province of China \(Grant Nos\. 2024A1515030166 and 2025B1515020032\), the Shenzhen Science and Technology Innovation Program \(Grant No\. KQTD20190929172835\-662\), and the Innovation Team Project of Guangdong Province \(Grant No\. 2024KCXTD017\)\.
## References
- \(1\)
- Achiam et al\.\(2023\)Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al\.2023\.GPT\-4 Technical Report\.arXiv preprint arXiv:2303\.08774\.[doi:10\.48550/arXiv\.2303\.08774](https://doi.org/10.48550/arXiv.2303.08774)
- Aran et al\.\(2019\)Dvir Aran, Agnieszka P Looney, Leqian Liu, Esther Wu, Valerie Fong, Austin Hsu, Suzanna Chak, Ram P Naikawadi, Paul J Wolters, Adam R Abate, et al\.2019\.Reference\-based analysis of lung single\-cell sequencing reveals a transitional profibrotic macrophage\.*Nature immunology*20, 2 \(2019\), 163–172\.
- Ashburner et al\.\(2000\)Michael Ashburner, Catherine A Ball, Judith A Blake, David Botstein, Heather Butler, J Michael Cherry, Allan P Davis, Kara Dolinski, Selina S Dwight, Janan T Eppig, et al\.2000\.Gene ontology: tool for the unification of biology\.*Nature genetics*25, 1 \(2000\), 25–29\.
- Brown et al\.\(2020\)Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al\.2020\.Language models are few\-shot learners\.*Advances in neural information processing systems*33 \(2020\), 1877–1901\.
- Chandak et al\.\(2023\)Payal Chandak, Kexin Huang, and Marinka Zitnik\. 2023\.Building a knowledge graph to enable precision medicine\.*Scientific Data*10, 1 \(2023\), 67\.
- Consortium\* et al\.\(2022\)The Tabula Sapiens Consortium\*, Robert C Jones, Jim Karkanias, Mark A Krasnow, Angela Oliveira Pisco, Stephen R Quake, Julia Salzman, Nir Yosef, Bryan Bulthaup, Phillip Brown, et al\.2022\.The Tabula Sapiens: A multiple\-organ, single\-cell transcriptomic atlas of humans\.*Science*376, 6594 \(2022\), eabl4896\.
- Cui et al\.\(2024\)Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang\. 2024\.scGPT: toward building a foundation model for single\-cell multi\-omics using generative AI\.*Nature methods*21, 8 \(2024\), 1470–1480\.
- Domínguez Conde et al\.\(2022\)C Domínguez Conde, Chao Xu, Louie B Jarvis, Daniel B Rainbow, Sara B Wells, Tamir Gomes, SK Howlett, O Suchanek, K Polanski, HW King, et al\.2022\.Cross\-tissue immune cell analysis reveals tissue\-specific features in humans\.*Science*376, 6594 \(2022\), eabl5197\.
- Fan et al\.\(2024\)Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat\-Seng Chua, and Qing Li\. 2024\.A survey on rag meeting llms: Towards retrieval\-augmented large language models\. In*Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining*\. Association for Computing Machinery, New York, NY, USA, 6491–6501\.[doi:10\.1145/3637528\.3671470](https://doi.org/10.1145/3637528.3671470)
- Hou and Ji \(2024\)Wenpin Hou and Zhicheng Ji\. 2024\.Assessing GPT\-4 for cell type annotation in single\-cell RNA\-seq analysis\.*Nature methods*21, 8 \(2024\), 1462–1465\.
- Hu et al\.\(2023\)Congxue Hu, Tengyue Li, Yingqi Xu, Xinxin Zhang, Feng Li, Jing Bai, Jing Chen, Wenqi Jiang, Kaiyue Yang, Qi Ou, et al\.2023\.CellMarker 2\.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA\-seq data\.*Nucleic acids research*51, D1 \(2023\), D870–D876\.
- Lewis et al\.\(2020\)Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen\-tau Yih, Tim Rocktäschel, et al\.2020\.Retrieval\-augmented generation for knowledge\-intensive nlp tasks\.*Advances in neural information processing systems*33 \(2020\), 9459–9474\.
- Luo et al\.\(2024\)Linhao Luo, Yuan\-Fang Li, Gholamreza Haffari, and Shirui Pan\. 2024\.Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning\. In*International Conference on Learning Representations*\. OpenReview\.net, Vienna, Austria, 14400–14423\.
- Ma et al\.\(2025\)Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, and Jian Guo\. 2025\.Think\-on\-Graph 2\.0: Deep and Faithful Large Language Model Reasoning with Knowledge\-guided Retrieval Augmented Generation\. In*The Thirteenth International Conference on Learning Representations*\. OpenReview\.net, Singapore, 52782–52806\.
- Pan et al\.\(2024\)Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu\. 2024\.Unifying large language models and knowledge graphs: A roadmap\.*IEEE Transactions on Knowledge and Data Engineering*36, 7 \(2024\), 3580–3599\.
- Pasquini et al\.\(2021\)Giovanni Pasquini, Jesus Eduardo Rojo Arias, Patrick Schäfer, and Volker Busskamp\. 2021\.Automated methods for cell type annotation on scRNA\-seq data\.*Computational and Structural Biotechnology Journal*19 \(2021\), 961–969\.
- Rizvi et al\.\(2025\)Syed Asad Rizvi, Daniel Levine, Aakash Patel, Shiyang Zhang, Eric Wang, Curtis Jamison Perry, Nicole Mayerli Constante, Sizhuang He, David Zhang, Cerise Tang, et al\.2025\.Scaling large language models for next\-generation single\-cell analysis\.2025–04 pages\.
- Sun et al\.\(2024\)Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel Ni, Heung\-Yeung Shum, and Jian Guo\. 2024\.Think\-on\-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph\. In*The Twelfth International Conference on Learning Representations*\. OpenReview\.net, Vienna, Austria, 3868–3898\.
- Tan et al\.\(2025\)Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, and Wenjie Zhang\. 2025\.Paths\-over\-graph: Knowledge graph empowered large language model reasoning\. In*Proceedings of the ACM on Web Conference 2025*\(Sydney NSW, Australia\)*\(WWW ’25\)*\. Association for Computing Machinery, New York, NY, USA, 3505–3522\.[doi:10\.1145/3696410\.3714892](https://doi.org/10.1145/3696410.3714892)
- Theodoris et al\.\(2023\)Christina V Theodoris, Ling Xiao, Anant Chopra, Mark D Chaffin, Zeina R Al Sayed, Matthew C Hill, Helene Mantineo, Elizabeth M Brydon, Zexian Zeng, X Shirley Liu, et al\.2023\.Transfer learning enables predictions in network biology\.*Nature*618, 7965 \(2023\), 616–624\.
- Yang et al\.\(2022\)Fan Yang, Wenchuan Wang, Fang Wang, Yuan Fang, Duyu Tang, Junzhou Huang, Hui Lu, and Jianhua Yao\. 2022\.scBERT as a large\-scale pretrained deep language model for cell type annotation of single\-cell RNA\-seq data\.*Nature Machine Intelligence*4, 10 \(2022\), 852–866\.
- Zhang et al\.\(2019\)Xinxin Zhang, Yujia Lan, Jinyuan Xu, Fei Quan, Erjie Zhao, Chunyu Deng, Tao Luo, Liwen Xu, Gaoming Liao, Min Yan, et al\.2019\.CellMarker: a manually curated resource of cell markers in human and mouse\.*Nucleic acids research*47, D1 \(2019\), D721–D728\.
- Zhao et al\.\(2024\)Suyuan Zhao, Jiahuan Zhang, Yushuai Wu, Yizhen Luo, and Zaiqing Nie\. 2024\.LangCell: Language\-Cell Pre\-training for Cell Identity Understanding\. In*International Conference on Machine Learning**\(Proceedings of Machine Learning Research, Vol\. 235\)*\. PMLR, Vienna, Austria, 61159–61185\.
- Zhu et al\.\(2024\)Yuqi Zhu, Xiaohan Wang, Jing Chen, Shuofei Qiao, Yixin Ou, Yunzhi Yao, Shumin Deng, Huajun Chen, and Ningyu Zhang\. 2024\.Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities\.*World Wide Web*27, 5 \(2024\), 58\.Similar Articles
A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation
This paper proposes HyGRAG, a hierarchical graph RAG framework that integrates contextual and relational information for multi-hop reasoning, achieving a 9.7% average accuracy improvement over existing methods.
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation
ContextRAG introduces an extraction-free method for constructing hierarchical graph indices for retrieval-augmented generation, using Residual-Quantization K-Means and Formal Concept Analysis to reduce LLM calls and tokens by orders of magnitude while maintaining competitive F1 scores on multi-hop questions.
Discovering types for entity disambiguation
OpenAI researchers present a novel approach to entity disambiguation using type discovery, where a system predicts entity types from a pre-chosen category set to resolve ambiguous references. The method achieves state-of-the-art results on entity disambiguation datasets and enables efficient O(N) runtime entity ranking through type-based weighting.
Knowledge Graph-Enhanced Zero-Shot Topic Classification: A Multi-Strategy Comparative Study
This paper proposes a zero-shot multi-label topic classification framework enhanced with per-article knowledge graphs, comparing four base variants and their graph-augmented counterparts across fifteen LLMs and eight datasets. The study finds that keyword-enhanced classification performs best, and graph augmentation improves small models but degrades performance in larger ones.
RAGA: Reading-And-Graph-building-Agent for Autonomous Knowledge Graph Construction and Retrieval-Augmented Generation
RAGA is an LLM-driven autonomous agent that constructs knowledge graphs via a read-search-verify-construct cognitive loop and integrates hybrid symbolic-vector retrieval for retrieval-augmented generation, with experimental gains on scientific QA datasets.