Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment
Summary
MetaKGEnrich is a fully automated pipeline that uses graph metrics to detect knowledge gaps in LLM applications, retrieves web evidence, and improves answer quality by 80-87% across three benchmark datasets.
View Cached Full Text
Cached at: 05/19/26, 06:35 AM
# Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment Source: [https://arxiv.org/abs/2605.16676](https://arxiv.org/abs/2605.16676) [View PDF](https://arxiv.org/pdf/2605.16676) > Abstract:Metacognition\-the ability to monitor one's own knowledge state, spot gaps, and autonomously fill them\-\-remains largely absent from modern AI\. Here, we present MetaKGEnrich, a fully automated pipeline that endows large language model \(LLM\) applications with self\-directed knowledge repair\. The system \(i\) builds knowledge graphs from a seed query, \(ii\) detects sparse regions via seven graph metrics, \(iii\) has GPT\-4o generate targeted questions, \(iv\) retrieves web evidence with Tavily and ingests it into Neo4j, and \(v\) re\-answers the query with GraphRAG for GPT\-4 to evaluate improvement\. Tested on 30 queries from each of three widely\-used datasets: Google Research Natural Questions, MS MARCO, and Hot\-potQA\. MetaKGEnrich improved answer quality in 80% of HotpotQA questions, 87% of Google Research Natural Questions and 83% of MS MARCO questions, while preserving well\-supported regions\. This proof of concept demonstrates how topological self\-diagnosis plus targeted retrieval can advance AI toward humanlike metacognitive learning\. ## Submission history From: Brendan Conway\-Smith \[[view email](https://arxiv.org/show-email/bdc07f1f/2605.16676)\] **\[v1\]**Fri, 15 May 2026 22:32:07 UTC \(762 KB\)
Similar Articles
SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation
This paper introduces SAGE, a framework for scalable automated robustness augmentation of LLM knowledge evaluation benchmarks. It uses fine-tuned smaller models with reinforcement learning to generate and verify question variants at a lower cost than existing methods.
MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models
This paper introduces MHGraphBench, a knowledge-graph-grounded benchmark for evaluating large language models on mental health knowledge, including entity recognition, relation judgment, and multi-hop reasoning. Experiments across 15 LLMs reveal a gap between recognition and judgment capabilities.
BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation
BLINKG is a benchmark designed to evaluate the mapping capabilities of Large Language Models (LLMs) in constructing Knowledge Graphs from heterogeneous data sources. It provides a standardized framework to assess how effectively LLMs establish correspondences between data schemas and ontology concepts.
ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents
ExpGraph is a model-agnostic framework that enables LLM agents to reuse past experiences via a self-evolving graph of skills and failures, improving task performance by 12–21% without retraining the executor.
PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents
PersonalAI 2.0 introduces a framework that enhances LLM-based systems by integrating external knowledge graphs with dynamic multistage query processing and adaptive planning mechanisms, achieving reductions in hallucination rates and improved precision across multiple benchmarks.