HypoAgent: An Agentic Framework for Interactive Abductive Hypothesis Generation over Knowledge Graphs
Summary
HypoAgent is an agentic framework for interactive abductive hypothesis generation over knowledge graphs, integrating three agents to handle evolving user intents and fine-grained diagnosis, achieving state-of-the-art performance.
View Cached Full Text
Cached at: 06/01/26, 09:27 AM
# HypoAgent: An Agentic Framework for Interactive Abductive Hypothesis Generation over Knowledge Graphs
Source: [https://arxiv.org/html/2605.31370](https://arxiv.org/html/2605.31370)
Yisen Gao1,Yixi Cai2,Tianshi Zheng1,Jiaxin Bai3,Yangqiu Song1,
1The Hong Kong University of Science and Technology 2Beihang University,3Hong Kong Baptist University ygaodi@connect\.ust\.hk
###### Abstract
Abductive reasoning over knowledge graphs aims to generate logical hypotheses that explain observed entities or facts\. Existing controllable hypothesis generation methods allow users to guide this process with explicit conditions, but they remain limited in interactive settings: they struggle to ground evolving natural\-language intents across multi\-turn dialogues and provide little fine\-grained diagnosis when generated hypotheses fail\. To address these limitations, we proposeHypoAgent, an Agentic framework for interactive abductive Hypothesis Generation over knowledge graphs\. HypoAgent integrates three agents: an Intent Recognition Agent that grounds user utterances and dialogue history into executable KG conditions, a Hypothesis Generation Agent that performs controllable hypothesis generation according to the extracted user intention, and a Root Cause Analysis Agent that diagnoses unreliable hypothesis fragments and leverages KG neighborhood probing to identify supported refinements\. Experiments on commonsense and biomedical domain\-specific knowledge graphs demonstrate that HypoAgent achieves state\-of\-the\-art semantic similarity under single\-turn, multi\-turn, and unconditional settings\. Our code is available at https://github\.com/HKUST\-KnowComp/HypoAgent\.
HypoAgent: An Agentic Framework for Interactive Abductive Hypothesis Generation over Knowledge Graphs
Yisen Gao1, Yixi Cai2, Tianshi Zheng1, Jiaxin Bai3, Yangqiu Song1,1The Hong Kong University of Science and Technology2Beihang University,3Hong Kong Baptist Universityygaodi@connect\.ust\.hk
## 1Introduction
\(a\)Evolving user intent\.
\(b\)Fine\-grained failure diagnosis\.
Figure 1:Two challenges in interactive abductive reasoning over knowledge graphs\.Abductive reasoningPaul \([1993](https://arxiv.org/html/2605.31370#bib.bib26)\)is one of the three classical modes of reasoning, together with deductiveJohnson\-Laird \([1999](https://arxiv.org/html/2605.31370#bib.bib53)\)and inductive reasoningHayeset al\.\([2010](https://arxiv.org/html/2605.31370#bib.bib54)\)\. It seeks the most plausible explanation for observed phenomena by reasoning from evidence to explanatory hypotheses\. It is widely used in clinical diagnosisPukancová and Homola \([2015](https://arxiv.org/html/2605.31370#bib.bib7)\); Martini \([2023](https://arxiv.org/html/2605.31370#bib.bib6)\), where physicians infer latent diseases or mechanisms from symptoms and patient profiles, and in scientific discoveryEngelschaltet al\.\([2023](https://arxiv.org/html/2605.31370#bib.bib9)\); Wackerly \([2021](https://arxiv.org/html/2605.31370#bib.bib10)\); Duede and Evans \([2021](https://arxiv.org/html/2605.31370#bib.bib11)\); Upmeier zu Belzenet al\.\([2021](https://arxiv.org/html/2605.31370#bib.bib12)\), where researchers propose and revise hypotheses to explain empirical observations\.
Knowledge graphs \(KGs\) provide a natural substrate for abductive reasoning by representing domain knowledge as interconnected entities and relations\. Over a KG, abductive reasoningBaiet al\.\([2024](https://arxiv.org/html/2605.31370#bib.bib14)\)aims to recover a first\-order logic hypothesis whose answer set explains a given set of observed entities\. For example, given several diseases, a system may generate hypotheses about their shared mechanisms, treatments, or susceptible populations\. Such logical hypotheses are interpretable because their answer sets and supporting paths can be inspected over the KG\. However, abductive reasoning over KG is challenging, as a single observation set may admit many plausible hypotheses, especially in large and densely connected KGs\. Recent workGaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55)\)addresses this issue through controllable hypothesis generation, where users guide generation with explicit conditions such as target entities, relations or logical patterns to obtain hypotheses that match their interests and intents\.
However, existing abductive reasoning methodsBaiet al\.\([2024](https://arxiv.org/html/2605.31370#bib.bib14)\); Gaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55),[2026b](https://arxiv.org/html/2605.31370#bib.bib56)\)face two key challenges when used in interactive hypothesis generation\.
First, they lack effective mechanisms for grounding evolving and potentially ambiguous natural\-language intents in multi\-turn interactions\. In practice, users often begin with a broad exploration of the observations and gradually refine their needs through follow\-up feedback\. As shown in Fig\.[1\(a\)](https://arxiv.org/html/2605.31370#S1.F1.sf1), given the biomedical observation \{rheumatoid arthritis, psoriatic arthritis, ankylosing spondylitis\}, a user may first ask what these diseases are related to, then request a mechanistic explanation, and later shift toward treatment strategies or drugs\. Although these turns can be mapped to executable KG conditions, the intended control conditions are not static: they change dynamically with the dialogue context, and previous interaction history can provide essential clues for interpreting the current utterance\. Therefore, grounding user intent requires jointly considering both the current utterance and the accumulated dialogue history\. However, existing single\-turn controllable generators typically require users to explicitly specify control signals\. While they can handle fixed conditions such as target relations, entities, or logical patterns, they cannot interpret dynamic and context\-dependent natural\-language intents, such as asking to “explore more” about a previously discussed mechanism\.
Second, they lack fine\-grained diagnosis of abductive reasoning failures\. In scientific applications, a failed hypothesis may still contain useful fragmentsOkes \([2019](https://arxiv.org/html/2605.31370#bib.bib60)\)\. As shown in Fig\.[1\(b\)](https://arxiv.org/html/2605.31370#S1.F1.sf2), given the observed drugs \{Infliximab, Adalimumab, Golimumab\}, an abductive system may generate a hypothesis stating that these entities are drugs that treat rheumatoid arthritis and target IL\-17\. Executing this hypothesis may instead retrieve \{Secukinumab, Ixekizumab\}, indicating that the full hypothesis does not recover the original observation\. However, this mismatch alone does not reveal the root cause of the failure\. A fragment\-level analysis can show that the treatment relation to rheumatoid arthritis is supported, while the IL\-17 target fragment is unsupported for the observed drugs\. Further neighborhood evidence may indicate that the observed drugs instead share an inflammatory bowel disease treatment relation\. Thus, the useful correction is not to discard the entire hypothesis, but to replace the faulty fragment with a KG\-supported relation, yielding a refined hypothesis involving treatment of both rheumatoid arthritis and inflammatory bowel disease\. Existing methodsBaiet al\.\([2024](https://arxiv.org/html/2605.31370#bib.bib14)\); Gaoet al\.\([2026b](https://arxiv.org/html/2605.31370#bib.bib56)\)usually evaluate hypotheses only through final answer\-set similarity, so they cannot identify which logical fragments are correct, which fragment causes the error, or how to repair the hypothesis in a targeted way\.
#### Our approach\.
To address these challenges, we proposeHypoAgent, anAgentic framework for interactive and explainable abductiveHypothesis Generation over knowledge graphs\. HypoAgent consists of three collaborative agents\. The Intent Recognition Agent grounds each user utterance, together with the dialogue history, into executable KG conditions, enabling the system to capture evolving intents from ambiguous natural\-language feedback\. The Hypothesis Generation Agent invokes a controllable hypothesis generator based on the grounded conditions and maintains turn\-level memory, including hypotheses, logical forms, interpretations, and evaluation results\. The Root Cause Analysis Agent diagnoses failed hypotheses by decomposing them into executable fragments, checking their KG support, and probing observed\-entity neighborhoods for missing relation–entity anchors\. It further proposes targeted condition corrections, allowing the system to repair unreliable fragments rather than regenerate hypotheses from scratch\.
Our main contributions are summarized as follows:
- •We identify two key challenges faced by existing abductive KG reasoning methods in interactive multi\-turn settings: grounding evolving natural\-language user intents across dialogue turns, and performing fine\-grained root cause analysis for failed or partially correct hypotheses\.
- •We propose HypoAgent, an agentic framework that integrates intent recognition, history\-aware controllable hypothesis generation, and KG\-grounded root cause analysis\. This design enables users to interactively refine abductive hypotheses through natural\-language feedback\.
- •We conduct experiments on DBpedia50, PharmKG, and BioKG, covering both commonsense and biomedical domain\-specific knowledge graphs\. Results show that HypoAgent achieves state\-of\-the\-art performance in semantic similarity under unconditional, single\-turn, and multi\-turn settings\.
## 2Related Work
Abductive Reasoning\. Abductive reasoning aims to infer plausible explanations from observations and has been studied in a variety of general reasoning settings\. In natural language inference,α\\alpha\-NLI\(Bhagavatulaet al\.,[2020](https://arxiv.org/html/2605.31370#bib.bib27)\)introduced abductive reasoning as a commonsense reasoning task, where systems select the most plausible explanation for an observed event sequence\. Subsequent studies improve abductive inference through decoding, prompting, self\-consistency, and uncommon\-sense reasoning techniques\(Qinet al\.,[2021](https://arxiv.org/html/2605.31370#bib.bib28);kadiķis2022embarrassinglysimpleperformanceprediction; Chanet al\.,[2023](https://arxiv.org/html/2605.31370#bib.bib30); Zhaoet al\.,[2024](https://arxiv.org/html/2605.31370#bib.bib31)\)\. Beyond commonsense scenarios, abductive reasoning has also been explored in formal textual reasoning benchmarks such as ProofWriter\(Tafjordet al\.,[2021](https://arxiv.org/html/2605.31370#bib.bib32)\), open\-world reasoning with LLMs\(Zhonget al\.,[2023](https://arxiv.org/html/2605.31370#bib.bib35); Del and Fishel,[2023](https://arxiv.org/html/2605.31370#bib.bib33); Thagard,[2024](https://arxiv.org/html/2605.31370#bib.bib34)\), abstract logical reasoning\(Liuet al\.,[2024](https://arxiv.org/html/2605.31370#bib.bib37); Zhenget al\.,[2025](https://arxiv.org/html/2605.31370#bib.bib36)\), and neuro\-symbolic abductive learning\(Zhou,[2019](https://arxiv.org/html/2605.31370#bib.bib22); Camposampieroet al\.,[2024](https://arxiv.org/html/2605.31370#bib.bib23); Huet al\.,[2025](https://arxiv.org/html/2605.31370#bib.bib24)\)\.
#### Abductive Reasoning on Knowledge Graphs\.
Abductive reasoning over knowledge graphs aims to generate logical hypotheses that explain a set of observed entities or facts\. AbductiveKGR\(Baiet al\.,[2024](https://arxiv.org/html/2605.31370#bib.bib14)\)first introduced this problem by formulating KG abduction as Transformer\-based hypothesis generation, where a model produces candidate logical explanations for observations\. More recently, CtrlHGenGaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55)\)introduced controllable abductive hypothesis generation by conditioning the generator on user\-specified control signals, enabling generated hypotheses to better satisfy desired semantic or structural constraints\. DARKGaoet al\.\([2026b](https://arxiv.org/html/2605.31370#bib.bib56)\)further unified deductive and abductive reasoning in knowledge graphs by capturing their bidirectional relationship: it treats hypotheses and conclusions as two sides of a sequence and uses masked diffusion to model reasoning in both directions\. Together, these efforts have significantly advanced neural graph database reasoningBaiet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib21)\); Liet al\.\([2026](https://arxiv.org/html/2605.31370#bib.bib57)\); Xieet al\.\([2026](https://arxiv.org/html/2605.31370#bib.bib58)\); Gaoet al\.\([2026a](https://arxiv.org/html/2605.31370#bib.bib59)\)by improving the generation, control, and better understanding user query intentions\.
## 3Preliminary
#### Notation\.
We define a knowledge graph as𝒢=\(ℰ,ℛ,𝒯\)\\mathcal\{G\}=\(\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{T\}\), whereℰ\\mathcal\{E\}is the set of entities,ℛ\\mathcal\{R\}is the set of relations, and𝒯⊆ℰ×ℛ×ℰ\\mathcal\{T\}\\subseteq\\mathcal\{E\}\\times\\mathcal\{R\}\\times\\mathcal\{E\}is the set of observed triples\. A triple\(vi,r,vj\)∈𝒯\(v\_\{i\},r,v\_\{j\}\)\\in\\mathcal\{T\}indicates that relationrrholds between entitiesviv\_\{i\}andvjv\_\{j\}\. Following the open\-world assumptionDrummond and Shearer \([2006](https://arxiv.org/html/2605.31370#bib.bib25)\), unobserved triples are treated as unknown rather than false\. In this work, a hypothesisHHis represented as a first\-order logic formula over the entities, variables, and relations in𝒢\\mathcal\{G\}, with existential quantifiers and logical connectives such as∧\\wedge,∨\\vee, and¬\\neg\. For simplicity, a conjunctive hypothesis can be written as:
H\(X\)=∃Z1,…,Zk:a1∧a2∧⋯∧am,H\(X\)=\\exists Z\_\{1\},\\ldots,Z\_\{k\}:\\ a\_\{1\}\\wedge a\_\{2\}\\wedge\\cdots\\wedge a\_\{m\},whereXXis the target variable,Z1,…,ZkZ\_\{1\},\\ldots,Z\_\{k\}are existentially quantified intermediate variables, and eachaia\_\{i\}is a relational literal, i\.e\., either a relational predicater\(u,v\)r\(u,v\)or its negation¬r\(u,v\)\\neg r\(u,v\)\. Here,rris a relation in𝒢\\mathcal\{G\}, andu,vu,vcan be entities, the target variableXX, or existential variables from\{Z1,…,Zk\}\\\{Z\_\{1\},\\ldots,Z\_\{k\}\\\}\.
Given a hypothesisHH, we denote by𝒜𝒢\(H\)\\mathcal\{A\}\_\{\\mathcal\{G\}\}\(H\)the answer set obtained by executingHHon𝒢\\mathcal\{G\}\. That is,𝒜𝒢\(H\)⊆ℰ\\mathcal\{A\}\_\{\\mathcal\{G\}\}\(H\)\\subseteq\\mathcal\{E\}contains the entities assigned to the target variableXXfor whichH\(X\)H\(X\)holds true on𝒢\\mathcal\{G\}\.
#### Task definition\.
Given a knowledge graph𝒢\\mathcal\{G\}and a set of observed entities𝒪=\{o1,o2,…,on\}⊆ℰ\\mathcal\{O\}=\\\{o\_\{1\},o\_\{2\},\\ldots,o\_\{n\}\\\}\\subseteq\\mathcal\{E\}, abductive hypothesis generation in KG aims to infer a plausible first\-order logic hypothesisHHthat explains the observations\. The quality ofHHis measured by the similarity between its answer set𝒜𝒢\(H\)\\mathcal\{A\}\_\{\\mathcal\{G\}\}\(H\)and the observed entity set𝒪\\mathcal\{O\}\. That is to say, the objective is to find an optimal hypothesisH∗H^\{\\ast\}that maximizes the similarity between its answer set and the observed entities:
H∗=argmaxHsim\(𝒜𝒢\(H\),𝒪\),H^\{\\ast\}=\\arg\\max\_\{H\}\\operatorname\{sim\}\\left\(\\mathcal\{A\}\_\{\\mathcal\{G\}\}\(H\),\\mathcal\{O\}\\right\),\(1\)
where the similarity functionsim\(⋅,⋅\)\\operatorname\{sim\}\(\\cdot,\\cdot\)can be instantiated by set\-based metrics such as Jaccard similarity, Dice coefficient, and Overlap coefficient\. In controllable abductive hypothesis generationGaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55)\), the model is additionally provided with a user\-specified control condition𝐜\\mathbf\{c\}\. The goal is to generate a hypothesisHHthat not only explains the observed entities𝒪\\mathcal\{O\}, but also satisfies the control condition𝐜\\mathbf\{c\}\. In this paper, we extend this setting from single\-turn control to multi\-turn interactive hypothesis generation\. At turntt, the control condition𝐜t\\mathbf\{c\}\_\{t\}is grounded not only in the current user utteranceutu\_\{t\}, but also in the interaction history𝒟<t\\mathcal\{D\}\_\{<t\}\.
## 4Method
Figure 2:Overview of HypoAgent, which consists of an Intent Recognition Agent, a Hypothesis Generation Agent, and a Root Cause Analysis Agent for interactive abductive hypothesis generation over knowledge graphs\.### 4\.1Overview
We proposeHypoAgent, a multi\-agent framework for interactive abductive reasoning over knowledge graphs, as illustrated in Fig\.[2](https://arxiv.org/html/2605.31370#S4.F2)\. Given a set of observed entities𝒪\\mathcal\{O\}and a user utteranceutu\_\{t\}at turntt, the system aims to generate a first\-order logic hypothesisHtH\_\{t\}that explains𝒪\\mathcal\{O\}while satisfying the user’s control intention\. At each turn,HypoAgentoperates through three sequential components\. First, the Intent Recognition Agent \(in Section[4\.2](https://arxiv.org/html/2605.31370#S4.SS2)\) interprets the user’s natural\-language utterance in the context of the interaction history and converts it into structured control conditions\. Second, the Hypothesis Generation Agent \(in Section[4\.3](https://arxiv.org/html/2605.31370#S4.SS3)\) invokes a trained lightweight hypothesis generator to produce hypotheses under the recognized conditions\. If the generated hypothesis is unsatisfactory, the Root Cause Analysis Agent \(in Section[4\.4](https://arxiv.org/html/2605.31370#S4.SS4)\) diagnoses the failure from both structural and semantic perspectives, refines the control conditions, and feeds them back into the next reasoning iteration\.
### 4\.2Intent Recognition Agent
The Intent Recognition Agent \(IRA\) converts a free\-form user utterance into structured control conditions that can be consumed by the hypothesis generator\. Different from single\-turn instruction parsing, the IRA is history\-aware: it uses both the current utterance and previous interaction states to resolve vague, implicit, or relative user intentions\. Specifically, at turntt, the IRA receives the observed entity set𝒪\\mathcal\{O\}, the current user utteranceutu\_\{t\}, and the interaction history𝒟<t\\mathcal\{D\}\_\{<t\}, where𝒟<t\\mathcal\{D\}\_\{<t\}stores previously generated hypotheses, their natural\-language verbalizations, recognized conditions, and execution scores\. This allows the IRA to interpret utterances such as “make it simpler”, “keep the entity but change the relation”, or “I want to explore more about it” according to the previous hypotheses rather than treating them as isolated commands\. Formally, the IRA performs the following transformation:
IRA\(𝒪,ut,𝒟<t\)→𝐜t,\\mathrm\{IRA\}\(\\mathcal\{O\},u\_\{t\},\\mathcal\{D\}\_\{<t\}\)\\rightarrow\\mathbf\{c\}\_\{t\},\(2\)
where𝐜t=\{\(ki,vi\)\}\\mathbf\{c\}\_\{t\}=\\\{\(k\_\{i\},v\_\{i\}\)\\\}is a structured condition set\. FollowingGaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55)\), we decompose user control intentions into five condition types from both semantic and structural perspectives\. Specifically, each keykik\_\{i\}belongs to one of five condition types:relation,entity,relationnumber,entitynumber, andpattern\. Among them,relationandentitycapture semantic constraints by specifying the desired relation and entity anchor, whilerelationnumber,entitynumber, andpatterncapture structural constraints by specifying the number of relations, the number of entity anchors, and the specific logical pattern of the target hypothesis\. For explicit instructions, the IRA directly extracts the corresponding values from the utterance and grounds relation/entity names to the KG vocabulary\. For vague or comparative instructions, the IRA infers the intended conditions from the interaction history\. For example, if the user says “make it simpler”, the IRA may reduce the number of relations or entity anchors relative to the previous dialogue; if the user says “use another relation”, it will preserve the structural conditions while replacing the semantic relation constraint\.
### 4\.3Hypothesis Generation Agent
The Hypothesis Generation Agent \(HGA\) generates executable first\-order logic hypotheses under the control conditions𝐜t\\mathbf\{c\}\_\{t\}recognized by the IRA\. To enable efficient and controllable generation, the HGA invokes a lightweight conditional generatorpθp\_\{\\theta\}\. Specifically, given the observed entity set𝒪\\mathcal\{O\}and the recognized condition set𝐜t\\mathbf\{c\}\_\{t\}, the generator produces a first\-order logic hypothesisHtH\_\{t\}\. This process can be formulated as an autoregressive generation process:
pθ\(Ht∣𝒪,𝐜t\)=∏ℓ=1Lpθ\(htℓ∣ht<ℓ,𝒪,𝐜t\),p\_\{\\theta\}\(H\_\{t\}\\mid\\mathcal\{O\},\\mathbf\{c\}\_\{t\}\)=\\prod\_\{\\ell=1\}^\{L\}p\_\{\\theta\}\(h^\{\\ell\}\_\{t\}\\mid h^\{<\\ell\}\_\{t\},\\mathcal\{O\},\\mathbf\{c\}\_\{t\}\),\(3\)wherehtℓh^\{\\ell\}\_\{t\}denotes theℓ\\ell\-th generation token of the hypothesisHtH\_\{t\}\. The generated logical hypothesisHtH\_\{t\}is then verbalized into natural language for user\-facing interaction\.
We train the generatorpθp\_\{\\theta\}in two stages\. In the first stage, we train it without conditions to learn general hypothesis generation ability from sampled observation–hypothesis pairs\. In the second stage, we introduce multiple control conditions and further train the generator on observation–condition–hypothesis triples, enabling it to generate hypotheses that both explain the observed entities and satisfy the specified semantic and structural constraints\.
During inference, the HGA receives𝐜t\\mathbf\{c\}\_\{t\}from the IRA and calls the trained generatorpθp\_\{\\theta\}to produce a hypothesisHtH\_\{t\}\. The generated hypothesis is executed on the KG to obtain its answer set𝒜𝒢\(Ht\)\\mathcal\{A\}\_\{\\mathcal\{G\}\}\(H\_\{t\}\)\. The quality ofHtH\_\{t\}is evaluated by comparing𝒜𝒢\(Ht\)\\mathcal\{A\}\_\{\\mathcal\{G\}\}\(H\_\{t\}\)with the observation set𝒪\\mathcal\{O\}using set\-based metrics such as Jaccard similarity, Dice coefficient, and overlap coefficient\. If the generated hypothesis reaches a satisfactory score, it is direclty returned to the user\. Otherwise, the system invokes the Root Cause Analysis Agent for further diagnosis and refinement\.
### 4\.4Root Cause Analysis Agent
The Root Cause Analysis Agent \(RCAA\) analyzes why a generated hypothesis fails and provides targeted refinement signals for the next reasoning iteration\. Instead of simply requesting another sample from the generator, the RCAA examines the failure from two complementary perspectives: the internal structure of the generated hypothesis and the external evidence provided by the KG neighborhood of the observations\. Specifically, it performs Hypothesis Fragment Diagnosis to identify reliable and unreliable parts of the current hypothesis, and Knowledge Neighborhood Probing to discover additional semantic evidence around the observed entities\. Together, these two analyses support more informed refinement of the control conditions\.
Hypothesis Fragment Diagnosis\.Given an unsatisfactory hypothesisHtH\_\{t\}, the RCAA decomposes it into smaller executable fragments\. Each fragment corresponds to a meaningful component of the original first\-order formula, such as a single relational atom, an intermediate projection chain, or a partial conjunction of constraints\. The RCAA executes each fragment on the KG and evaluates the similarity between its answer set and the observed entity set𝒪\\mathcal\{O\}\. Fragments with high similarity are treated as reliable components that should be preserved, whereas fragments with low similarity are regarded as potentially irrelevant or erroneous components that should be revised or removed\. In this way, fragment\-level diagnosis distinguishes hypothesis components that are likely to explain the observations from those that may introduce irrelevant or misleading constraints\.
KG Neighborhood Probing\.In parallel, the RCAA probes the local KG neighborhoods around the observed entities\. The intuition is that useful semantic anchors often appear in the shared or nearby neighborhoods of the observations\. The agent searches for candidate relations, entities, and short relational paths that commonly connect to the observed entities, and evaluates how well the corresponding candidate hypotheses recover𝒪\\mathcal\{O\}\. High\-scoring neighborhood evidence suggests promising relation or entity conditions that may not have appeared in the original user utterance or the generated hypothesis\.
Hypothesis Refinement Strategy\.Based on Hypothesis Fragment Diagnosis and KG Neighborhood Probing, the RCAA refines hypotheses in two ways\. First, it repairs the current hypothesis by preserving reliable fragments and replacing weak components with plausible relations, entities, or paths discovered from the KG neighborhood\. Second, when direct repair is insufficient, it synthesizes refined control conditions that guide the generator toward a more promising hypothesis space while preserving the user’s intent\. For example, a strong semantic anchor found during neighborhood probing can be incorporated into the repaired hypothesis or added as a condition for the next generation step\. The refined conditions are then fed back with the updated interaction history, forming an iterative reasoning loop\. Thus, HypoAgent progressively improves hypotheses through user control, conditional generation, and root\-cause\-driven refinement\.
## 5Experiment
Table 1:Single\-turn controllable hypothesis generation results\. \(Bold: best;Underline: runner\-up\)\.Table 2:Multi\-turn hypothesis generation results\.\(Bold: best;Underline: runner\-up\)\.We conduct experiments to evaluate HypoAgent under single\-turn, multi\-turn, and unconditional abductive hypothesis generation settings\. Our evaluation is designed to answer the following research questions:
- •RQ1: Single\-turn controllable hypothesis generation\.Does HypoAgent improve hypothesis quality through root cause analysis when explicit generation conditions are provided?
- •RQ2: Multi\-turn goal\-oriented refinement\.Can HypoAgent achieve user goals by interpreting feedback and refining hypotheses across multiple dialogue turns?
- •RQ3: Unconditional self\-improvement\.Can HypoAgent infer useful conditions and improve hypothesis quality when users cannot provide explicit requirements?
### 5\.1Experimental Setup
Datasets\.We evaluate HypoAgent on three KG benchmarks: BioKGWalshet al\.\([2020](https://arxiv.org/html/2605.31370#bib.bib61)\), PharmKG8kZhenget al\.\([2021](https://arxiv.org/html/2605.31370#bib.bib62)\), and DBpedia50Aueret al\.\([2007](https://arxiv.org/html/2605.31370#bib.bib40)\)\. Following prior workBaiet al\.\([2024](https://arxiv.org/html/2605.31370#bib.bib14)\); Gaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55)\), each KG is split into training, validation, and test graphs with an 8:1:1 ratio\. We then sample hypothesis–observation pairs from 13 logical query patterns on these splits for training and evaluation, with at most 32 observed entities per case\. The unconditional setting directly uses these sampled pairs\. For single\-turn and multi\-turn dialogue settings, we further simulate user queries from the sampled cases: single\-turn queries verbalize sampled control conditions, while multi\-turn queries are built from cases with the same observations but different hypotheses to reflect progressive intent shifts\. Qwen2\.5\-7BHuiet al\.\([2024](https://arxiv.org/html/2605.31370#bib.bib45)\)is then used for converting structured conditions or inferred intents into natural\-language user queries\. Details of data construction are provided in Appendix[A](https://arxiv.org/html/2605.31370#A1)\.
Evaluation Metrics\.We evaluate generated hypotheses from two perspectives: semantic similarity and condition adherence\. For semantic similarity, we compare the answer set of each generated hypothesis with the ground\-truth observation set using Jaccard similarity, Dice coefficient, and overlap coefficient\. For condition adherence, we report accuracy, which measures whether the generated hypothesis satisfies the specified control condition\.
Baselines\.For single\-turn controllable hypothesis generation, we compare HypoAgent with CtrlHGenGaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55)\), the controllable hypothesis generation backbone\. For unconditional hypothesis generation, we compare with two representative abductive hypothesis generation methods, AbductiveKGRBaiet al\.\([2024](https://arxiv.org/html/2605.31370#bib.bib14)\)and DARKGaoet al\.\([2026b](https://arxiv.org/html/2605.31370#bib.bib56)\), which implement generation with autoregressive and diffusion\-based architectures respectively\.
Implementation Details\.For the small hypothesis generation model, we follow the previous settingGaoet al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib55)\)and adopt a 6\-layer Transformer architecture\. For the LLM backend of the agents, we instantiate HypoAgent with DeepSeek\-v4\-Flash, Qwen3\-235BYanget al\.\([2025](https://arxiv.org/html/2605.31370#bib.bib46)\), and GPT\-5\.4\-mini, denoted as HypoAgentDS\{\}\_\{\\textsc\{DS\}\}, HypoAgentQwen\{\}\_\{\\textsc\{Qwen\}\}, and HypoAgentGPT\{\}\_\{\\textsc\{GPT\}\}, respectively\. All experiments are conducted on an NVIDIA A6000 48GB GPU\. More implementation details including training and agent construction are reported in Appendix[B](https://arxiv.org/html/2605.31370#A2)\.
### 5\.2RQ1: Single\-turn Controllable Hypothesis Generation
Table[1](https://arxiv.org/html/2605.31370#S5.T1)reports the results of single\-turn controllable hypothesis generation\. HypoAgent consistently improves semantic similarity over CtrlHGen across all three datasets, with all variants achieving overlap scores above 0\.9\. For example, on PharmKG8k, the best HypoAgent variant improves Jaccard similarity from 63\.3 to 82\.4\. These gains show that root\-cause\-driven refinement helps generate hypotheses that better explain the observations\. Although HypoAgent slightly lowers condition\-following accuracy on BioKG and PharmKG, this reflects a trade\-off where semantic refinement may improve explanatory quality while occasionally relaxing strict condition adherence\. Among LLM backends, HypoAgentDS\{\}\_\{\\textsc\{DS\}\}performs best overall, with only small gaps among variants\.
Figure 3:Jaccard similarity comparison under the unconditional setting \(left\) and RCA ablation study \(right\)\.
### 5\.3RQ2: Multi\-turn Goal\-oriented Hypothesis Refinement
Since user intents are more implicit and context\-dependent in the multi\-turn setting, we use DeepSeek\-v4\-Flash to judge whether generated hypotheses satisfy the intended conditions based on the full dialogue history\. We have reported the details of the judgement prompt in Appendix[B](https://arxiv.org/html/2605.31370#A2)\. Each case contains three turns, where the user progressively refines the target hypothesis\. Table[2](https://arxiv.org/html/2605.31370#S5.T2)reports the results\.
Compared with single\-turn evaluation, multi\-turn refinement shows lower condition\-adherence accuracy, reflecting the difficulty of resolving contextual intent shifts across turns\. Nevertheless, RCA consistently improves semantic similarity across datasets and LLM backends, outperforming w/o RCA on Jaccard similarity, Dice coefficient, and overlap coefficient\. This shows that RCA remains effective in multi\-turn refinement by identifying unreliable fragments and guiding the generator toward hypotheses that better explain the observations\. Among the backends, HypoAgentGPT\{\}\_\{\\textsc\{GPT\}\}achieves slightly better semantic similarity overall, while the gaps among variants remain small\.
### 5\.4RQ3: Unconditional Self\-improving Hypothesis Generation
Here, we evaluate unconditional self\-improving hypothesis generation on DBpedia50\. Different from the controllable settings, no user\-specified conditions are provided in this experiment\. The model is required to analyze its own generated hypothesis and the observed entities, construct useful refinement signals, and improve the hypothesis without external guidance\. We compare HypoAgent with two baseline methods, AbductiveKGR and DARK\.
Fig\.[3](https://arxiv.org/html/2605.31370#S5.F3)left presents the Jaccard similarity results on DBpedia50\. Overall, HypoAgent substantially outperforms both AbductiveKGR and CtrlHGen, showing that the proposed framework remains effective even without explicit user conditions\. This indicates that HypoAgent can successfully derive useful internal conditions from its own analysis and use them to guide subsequent hypothesis refinement\. Therefore, the performance gain is not merely brought by external control signals, but also by the model’s self\-improving ability\.
### 5\.5Ablation Study
We further conduct an ablation study on PharmKG8k using HypoAgentGPT\{\}\_\{\\textsc\{GPT\}\}to evaluate the Root Cause Analysis Agent and its two components\. We compare four variants: w/o RCA, only Knowledge Neighborhood Probing, only Hypothesis Fragment Diagnosis, and the full RCA module\.
The results ares shown in Fig\.[3](https://arxiv.org/html/2605.31370#S5.F3)right\. Removing RCA leads to a clear performance drop, confirming its importance for hypothesis refinement\. Using either Knowledge Neighborhood Probing or Hypothesis Fragment Diagnosis improves over w/o RCA, indicating that both provide useful diagnostic signals\. The full RCA module performs best, showing that the two components are complementary\. Moreover, removing Hypothesis Fragment Diagnosis causes a larger degradation, suggesting that direct fragment\-level diagnosis is especially important for complex hypotheses\.
### 5\.6Case study
We conduct three case studies to qualitatively examine HypoAgent under different settings, with results provided in Appendix[C](https://arxiv.org/html/2605.31370#A3)\. The first case on PharmKG shows that RCA improves single\-turn controlled generation by revising unreliable fragments with neighborhood evidence\. The second case on BioKG demonstrates that HypoAgent can track evolving intents in multi\-turn interaction and adapt the hypothesis structure accordingly\. The third case on DBpedia50 shows that, in the unconditional setting, the model can induce useful conditions from its initial hypothesis to escape weak local solutions\. Together, these cases illustrate the synergistic effects of different components in HypoAgent\.
## 6Conclusion
We presented HypoAgent, a multi\-agent framework for interactive abductive hypothesis generation over knowledge graphs\. HypoAgent extends controllable hypothesis generation from single\-turn control to multi\-turn interaction by recognizing user intent, generating condition\-aware hypotheses, and refining unsatisfactory results through root cause analysis\. Experiments on three KG benchmarks show that HypoAgent consistently improves semantic similarity while maintaining competitive condition adherence\.
## Limitation
HypoAgent depends on the quality and coverage of the underlying knowledge graph\. If the KG is incomplete, noisy, or contains incorrect relations, both hypothesis generation and RCA refinement may be affected by unreliable structural evidence\. Moreover, RCA mainly relies on local neighborhood signals, which may be insufficient for sparse graphs or explanations requiring long\-range reasoning\.
## References
- Dbpedia: a nucleus for a web of open data\.Ininternational semantic web conference,pp\. 722–735\.Cited by:[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p1.1)\.
- J\. Bai, Y\. Wang, T\. Zheng, Y\. Guo, X\. Liu, and Y\. Song \(2024\)Advancing abductive reasoning in knowledge graphs through complex logical hypothesis generation\.InACL \(1\),pp\. 1312–1329\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p2.1),[§1](https://arxiv.org/html/2605.31370#S1.p3.1),[§1](https://arxiv.org/html/2605.31370#S1.p5.1),[§2](https://arxiv.org/html/2605.31370#S2.SS0.SSS0.Px1.p1.1),[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p1.1),[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p3.1)\.
- J\. Bai, Z\. Wang, Y\. Zhou, H\. Yin, W\. Fei, Q\. Hu, Z\. Deng, J\. Cheng, T\. Zheng, H\. T\. Tsang,et al\.\(2025\)Top ten challenges towards agentic neural graph databases\.arXiv preprint arXiv:2501\.14224\.Cited by:[§2](https://arxiv.org/html/2605.31370#S2.SS0.SSS0.Px1.p1.1)\.
- C\. Bhagavatula, R\. L\. Bras, C\. Malaviya, K\. Sakaguchi, A\. Holtzman, H\. Rashkin, D\. Downey, S\. W\. Yih, and Y\. Choi \(2020\)Abductive commonsense reasoning\.External Links:1908\.05739,[Link](https://arxiv.org/abs/1908.05739)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- G\. Camposampiero, M\. Hersche, A\. Terzic, R\. Wattenhofer, A\. Sebastian, and A\. Rahimi \(2024\)Towards learning abductive reasoning using VSA distributed representations\.InNeSy \(1\),Lecture Notes in Computer Science, Vol\.14979,pp\. 370–385\.Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- C\. Chan, X\. Liu, T\. H\. Chan, J\. Cheng, Y\. Song, G\. Wong, and S\. See \(2023\)Self\-consistent narrative prompts on abductive natural language inference\.External Links:2309\.08303,[Link](https://arxiv.org/abs/2309.08303)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- M\. Del and M\. Fishel \(2023\)True detective: a deep abductive reasoning benchmark undoable for gpt\-3 and challenging for gpt\-4\.External Links:2212\.10114,[Link](https://arxiv.org/abs/2212.10114)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- N\. Drummond and R\. Shearer \(2006\)The open world assumption\.IneSI workshop: the closed world of databases meets the open world of the semantic web,Vol\.15,pp\. 1\.Cited by:[§3](https://arxiv.org/html/2605.31370#S3.SS0.SSS0.Px1.p1.13)\.
- E\. Duede and J\. Evans \(2021\)The social abduction of science\.arXiv preprint arXiv:2111\.13251\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- P\. Engelschalt, M\. Röske, J\. Penzlin, D\. Krüger, and A\. Upmeier zu Belzen \(2023\)Abductive reasoning in modeling biological phenomena as complex systems\.InFrontiers in Education,Vol\.8,pp\. 1170967\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- Y\. Gao, J\. Bai, H\. Huang, Z\. Xie, Y\. Li, H\. T\. Tsang, S\. Han, and Y\. Song \(2026a\)KGPFN: unlocking the potential of knowledge graph foundation model via in\-context learning\.arXiv preprint arXiv:2605\.14907\.Cited by:[§2](https://arxiv.org/html/2605.31370#S2.SS0.SSS0.Px1.p1.1)\.
- Y\. Gao, J\. Bai, Y\. Huang, X\. Fu, Q\. Sun, and Y\. Song \(2026b\)Unifying deductive and abductive reasoning in knowledge graphs with masked diffusion model\.InProceedings of the ACM Web Conference 2026,pp\. 3600–3611\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p3.1),[§1](https://arxiv.org/html/2605.31370#S1.p5.1),[§2](https://arxiv.org/html/2605.31370#S2.SS0.SSS0.Px1.p1.1),[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p3.1)\.
- Y\. Gao, J\. Bai, T\. Zheng, Q\. Sun, Z\. Zhang, X\. Fu, J\. Li, and Y\. Song \(2025\)Controllable logical hypothesis generation for abductive reasoning in knowledge graphs\.arXiv preprint arXiv:2505\.20948\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p2.1),[§1](https://arxiv.org/html/2605.31370#S1.p3.1),[§2](https://arxiv.org/html/2605.31370#S2.SS0.SSS0.Px1.p1.1),[§3](https://arxiv.org/html/2605.31370#S3.SS0.SSS0.Px2.p2.9),[§4\.2](https://arxiv.org/html/2605.31370#S4.SS2.p2.2),[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p1.1),[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p3.1),[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p4.3)\.
- B\. K\. Hayes, E\. Heit, and H\. Swendsen \(2010\)Inductive reasoning\.Wiley interdisciplinary reviews: Cognitive science1\(2\),pp\. 278–292\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- W\. Hu, W\. Dai, Y\. Jiang, and Z\. Zhou \(2025\)Efficient rectification of neuro\-symbolic reasoning inconsistencies by abductive reflection\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 17333–17341\.Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- B\. Hui, J\. Yang, Z\. Cui, J\. Yang, D\. Liu, L\. Zhang, T\. Liu, J\. Zhang, B\. Yu, K\. Lu,et al\.\(2024\)Qwen2\. 5\-coder technical report\.arXiv preprint arXiv:2409\.12186\.Cited by:[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p1.1)\.
- P\. N\. Johnson\-Laird \(1999\)Deductive reasoning\.Annual review of psychology50\(1\),pp\. 109–135\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- Y\. Li, Y\. Gao, J\. Bai, J\. Xiong, H\. Huang, Z\. Xie, H\. T\. Tsang, and Y\. Song \(2026\)Towards neural graph data management\.arXiv preprint arXiv:2603\.05529\.Cited by:[§2](https://arxiv.org/html/2605.31370#S2.SS0.SSS0.Px1.p1.1)\.
- E\. Liu, G\. Neubig, and J\. Andreas \(2024\)An incomplete loop: instruction inference, instruction following, and in\-context learning in language models\.External Links:2404\.03028,[Link](https://arxiv.org/abs/2404.03028)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- C\. Martini \(2023\)Abductive reasoning in clinical diagnostics\.InHandbook of abductive cognition,pp\. 467–479\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- D\. Okes \(2019\)Root cause analysis: the core of problem solving and corrective action\.Quality Press\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p5.1)\.
- G\. Paul \(1993\)Approaches to abductive reasoning: an overview\.Artificial intelligence review7\(2\),pp\. 109–152\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- J\. Pukancová and M\. Homola \(2015\)Abductive reasoning with description logics: use case in medical diagnosis\.InDescription Logics,CEUR Workshop Proceedings, Vol\.1350\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- L\. Qin, V\. Shwartz, P\. West, C\. Bhagavatula, J\. Hwang, R\. L\. Bras, A\. Bosselut, and Y\. Choi \(2021\)Back to the future: unsupervised backprop\-based decoding for counterfactual and abductive commonsense reasoning\.External Links:2010\.05906,[Link](https://arxiv.org/abs/2010.05906)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- O\. Tafjord, B\. D\. Mishra, and P\. Clark \(2021\)ProofWriter: generating implications, proofs, and abductive statements over natural language\.External Links:2012\.13048,[Link](https://arxiv.org/abs/2012.13048)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- P\. Thagard \(2024\)Can chatgpt make explanatory inferences? benchmarks for abductive reasoning\.External Links:2404\.18982,[Link](https://arxiv.org/abs/2404.18982)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- A\. Upmeier zu Belzen, P\. Engelschalt, and D\. Krüger \(2021\)Modeling as scientific reasoning—the role of abductive reasoning for modeling competence\.Education Sciences11\(9\),pp\. 495\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- J\. W\. Wackerly \(2021\)Abductive reasoning in organic chemistry\.Journal of Chemical Education98\(9\),pp\. 2746–2750\.Cited by:[§1](https://arxiv.org/html/2605.31370#S1.p1.1)\.
- B\. Walsh, S\. K\. Mohamed, and V\. Nováček \(2020\)Biokg: a knowledge graph for relational learning on biological data\.InProceedings of the 29th ACM International Conference on Information & Knowledge Management,pp\. 3173–3180\.Cited by:[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p1.1)\.
- Z\. Xie, J\. Bai, S\. Liu, H\. Huang, Y\. Li, Y\. Gao, H\. T\. Tsang, and Y\. Song \(2026\)NGDB\-zoo: towards efficient and scalable neural graph databases training\.arXiv preprint arXiv:2602\.21597\.Cited by:[§2](https://arxiv.org/html/2605.31370#S2.SS0.SSS0.Px1.p1.1)\.
- A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv,et al\.\(2025\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p4.3)\.
- W\. Zhao, J\. T\. Chiu, J\. D\. Hwang, F\. Brahman, J\. Hessel, S\. Choudhury, Y\. Choi, X\. L\. Li, and A\. Suhr \(2024\)UNcommonsense reasoning: abductive reasoning about uncommon situations\.External Links:2311\.08469,[Link](https://arxiv.org/abs/2311.08469)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- S\. Zheng, J\. Rao, Y\. Song, J\. Zhang, X\. Xiao, E\. F\. Fang, Y\. Yang, and Z\. Niu \(2021\)PharmKG: a dedicated knowledge graph benchmark for bomedical data mining\.Briefings in bioinformatics22\(4\),pp\. bbaa344\.Cited by:[§5\.1](https://arxiv.org/html/2605.31370#S5.SS1.p1.1)\.
- T\. Zheng, J\. Cheng, C\. Li, H\. Shi, Z\. Wang, J\. Bai, Y\. Song, G\. Y\. Wong, and S\. See \(2025\)LogiDynamics: unraveling the dynamics of logical inference in large language model reasoning\.External Links:2502\.11176,[Link](https://arxiv.org/abs/2502.11176)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- T\. Zhong, Y\. Wei, L\. Yang, Z\. Wu, Z\. Liu, X\. Wei, W\. Li, J\. Yao, C\. Ma, X\. Li, D\. Zhu, X\. Jiang, J\. Han, D\. Shen, T\. Liu, and T\. Zhang \(2023\)ChatABL: abductive learning via natural language interaction with chatgpt\.External Links:2304\.11107,[Link](https://arxiv.org/abs/2304.11107)Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
- Z\. Zhou \(2019\)Abductive learning: towards bridging machine learning and logical reasoning\.Sci\. China Inf\. Sci\.62\(7\),pp\. 76101:1–76101:3\.Cited by:[§2](https://arxiv.org/html/2605.31370#S2.p1.1)\.
## Appendix AData Construction
### A\.1Hypothesis\-Observation Pair Construction
Given a knowledge graphGGand a predefined logical patternPP, the algorithm samples a random nodevvas the target answer and recursively instantiates a hypothesis whose answer set containsvvand whose structure matchesPP\. During recursion, the current logical operator determines how the hypothesis is expanded\. For projection, the algorithm samples an incoming edge\(u,r,v\)\(u,r,v\)ofvvand recursively constructs the preceding sub\-hypothesis fromuu\. For intersection, all sub\-hypotheses are instantiated with the same target nodevv, because every branch must be satisfied byvv\. For union, the algorithm lets one branch derivevvand instantiates the remaining branches from randomly sampled nodes, since satisfying any one branch is sufficient forvvto be included in the union result\.
Here, for the knowledge graph, we do not augment it with additional inverse edges and only use the original directed edges\. The predefined 13 logical patterns used for hypothesis sampling are illustrated in Fig[4](https://arxiv.org/html/2605.31370#A1.F4)\.
Figure 4:The 13 predefined logical patterns used for hypothesis sampling\.
### A\.2Build Single\-turn Dataset
We use Qwen2\.5\-7B\-Instruct to generate the single\-turn dialogue dataset, with the generation parameters set tomax\_new\_tokens=80,temperature=0\.8,top\_p=0\.9, anddo\_sampleenabled\. During data processing, five types of conditions are extracted from thequery:entitynumber, which indicates the number of entities contained in the query;relationnumber, which indicates the number of relations contained in the query;entity, which refers to the specific entity specified in the query;relation, which refers to the specific relation specified in the query; andpattern, which represents the corresponding e/p query pattern\. For each sample, two conditions are randomly selected from these five types and organized into a natural\-language follow\-up question\. The generated sentence follows a unified template:
> I want a hypothesis that <condition\_1\> and <condition\_2\>\.
### A\.3Build Multi\-turn Dataset
For the multi\-turn setting, we construct a three\-turn dialogue for each dataset\. Before building the dialogues, queries with the same answer set are grouped together, and duplicate queries within each group are removed\. Then, three queries from each answer group are organized into a multi\-turn query list, where each turn corresponds to one query\.
For each turn, the follow\-up question is generated according to one of three intention modes:specific\-condition,scope\-narrowing, andscope\-expansion\. The first turn is fixed tospecific\-condition, since there is no preceding query for comparing complexity changes\. Starting from the second turn, we compare the number of entities and relations in the current query with those in the next target query, and sample an intention mode under validity constraints\. Specifically, if the current query is already at least as complex as the target query, meaning that its numbers of entities and relations are both greater than or equal to those of the target query, or that the sum of its entities and relations is greater than or equal to that of the target query, thenscope\-expansionis disabled\. Conversely, if the current query is no more complex than the target query, meaning that its numbers of entities and relations are both less than or equal to those of the target query, or that the sum of its entities and relations is less than or equal to that of the target query, thenscope\-narrowingis disabled\. When all three modes are valid, the program samplesspecific\-condition,scope\-expansion, andscope\-narrowingwith weights of 0\.5, 0\.25, and 0\.25, respectively\.
When the intention mode isspecific\-condition, the same five condition types and unified template described in the single\-turn setting are reused\. Two conditions are randomly selected and combined into one follow\-up question\. Example outputs include:
> I want a hypothesis that contains 2 entities and includes the relation "CC"\. I want a hypothesis that follows the pattern "p i n p e p e" and contains 2 entities\.
When the intention mode isscope\-narrowing, the program directly generates:
> This is too complex\. I want to make the logic simpler\.
When the intention mode isscope\-expansion, the program directly generates:
> I want to know more\. Let’s explore more\.
Finally, each follow\-up question is written into the corresponding multi\-turn dialogue, forming the final multi\-turn dialogue dataset\. We first sample a large pool of multi\-turn dialogues following the above procedure\. Specifically, we sample 1,951,477, 259,144, and 31,663 multi\-turn dialogues for BioKG, PharmKG8k, and DBpedia50, respectively\. From each dataset, we randomly select 500 dialogues as the test samples\.
## Appendix BExperiment Details
### B\.1Lightweight Hypothesis Generation Model Training Details
For the lightweight hypothesis generation model, we use a 6\-layer Transformer trained with AdamW optimizer\. The learning rate is set to1×10−41\\times 10^\{\-4\}, and the weight decay is set to1×10−51\\times 10^\{\-5\}\. For DBpedia50, we train the model for 200 epochs in the unconditional setting and 100 epochs in the conditional setting\. For BioKG, we train for 40 epochs in the unconditional setting and 15 epochs in the conditional setting\. For PharmKG8k, we train for 100 epochs in the unconditional setting and 160 epochs in the conditional setting\.
### B\.2Agent Details
We report the implementation prompts for the main LLM\-based components in HypoAgent\. Fig\.[5](https://arxiv.org/html/2605.31370#A2.F5)shows the condition parsing prompt for Intent Recognition Agent to convert user requests into structured conditions\. Fig\.[6](https://arxiv.org/html/2605.31370#A2.F6)shows the Root Cause Analysis Agent prompt for diagnosing and refining imperfect hypotheses\. Fig\.[7](https://arxiv.org/html/2605.31370#A2.F7)shows the prompt for inducing conditions from unconditional hypotheses in uncondition setting\. Fig\.[8](https://arxiv.org/html/2605.31370#A2.F8)shows the LLM\-based judge prompt for evaluating condition satisfaction\.
For RCA, we set the Jaccard similarity threshold to 0\.95 and activate RCA when the score falls below this threshold\.
Prompt 1: Condition ParsingTask:Parse the following question into a JSON array of condition dicts\.Valid condition types:•relation: value = ONE relation NAME, e\.g\."GG"•entity: value = ONE entity NAME, e\.g\."chrnb3"•relationnumber: integer count of relations, e\.g\."3"•entitynumber: integer count of entities, e\.g\."2"•pattern: structural pattern using onlyi/u/n/p/etokens, e\.g\."i p e p e"Rules:•NEVER useunconditional— always infer at least one concrete condition\.•AT MOST ONErelationcondition and AT MOST ONEentitycondition\.•relationnumber/entitynumbertake integer values, NOT names\.•patternmust use onlyi/u/n/p/etokens\.•Never output empty string as value forrelationorentity\.Examples:
"I want a hypothesis with relation GG"
→\\rightarrow\[\{"type":"relation","value":"GG"\}\]
"I want pattern i p e p e with 2 entities"
→\\rightarrow\[\{"type":"pattern","value":"i p e p e"\},\{"type":"entitynumber","value":"2"\}\]
Question:\[user followup question\]
Output:Return ONLY the JSON array, nothing else\.Figure 5:Condition parsing prompt used to convert natural language user requests into structured condition lists\.Prompt 2: Root Cause Analysis AgentTask:You are an analysis agent for KG abductive reasoning\. Given observed entitiesOO, find a logical hypothesisHHsuch that executingHHon the KG returns exactlyOO\.HHis one of 13 patterns \(1p/2p/2i/3i/ip/pi/2u/up/2in/3in/inp/pni/pin\)\. Analyze why the current hypothesis is imperfect, then propose 3 candidates\.Input:•Observations\(entity names\):\[answer\_nl\]•User’s original condition\(must be respected\):\[original\_followup\]•Generation history:\[round, condition, jaccard per round\]•Current hypothesis: NL form, raw action string, pattern, Jaccard score•ID↔\\leftrightarrowName lookup: relation and entity id\-to\-name mapsStep 1 — Sub\-logic Decomposition\(graph\_validation\)
Callgraph\_validation\(query\_tokens, label\_answers, split=’train’\)\. Each sub\-query result containsanswer\_count,overlap\_count,relation\_to\_label\. Identify the weakest branch \(lowestoverlap\_count\) to fix first\. Tool budget: at most 3 calls\.
Step 2 — Neighborhood Search\(incoming\_edge\_intersection\)
Callincoming\_edge\_intersection\(answer\_entity\_ids, split=’train’, top\_k=10\)\. Returnsflat\_candidates\(1\-hop entity\-relation pairs with Jaccard\) andtwo\_hop\_candidates\(2\-hop paths\)\. For2i/3i/ip/pipatterns, also callintersection\_candidates\(flat\_candidates\_json, mode\)\.
Step 3 — Produce 3 Candidates
Candidate 1\(keep\): original condition unchanged,hypothesis\_raw=null\.
Candidate 2\(update\): new condition extending original with analysis findings,hypothesis\_raw=null\.
Candidate 3\(generate\): directly compose a flat action stringhypothesis\_rawfrom tool results\.
Flat format: space\-separated tokens, no parentheses\. Relations: negative integers \(\-8\)\. Entities: positive integers \(1312\)\. E\.g\.2i:"i \-8 1312 \-20 1303"\.
Output:final\_answer\(candidates\)— a list of 3 dicts with keysanalysis,new\_condition,hypothesis\_raw\.Figure 6:Root Cause Analysis \(RCA\) agent prompt used in the iterative hypothesis refinement loop\.Prompt 3: Unconditional Condition Generation PromptTask:An unconditional hypothesis was generated\. Analyze it and generate structural and semantic conditions to improve hypothesis quality in subsequent conditional generation\.Input:•Observations\(entity names\):\[answer\_nl\]•Unconditional hypothesis: NL form, raw action string, Jaccard score•ID↔\\leftrightarrowName lookup: relation id\-to\-name mapStep 1 — Sub\-logic Decomposition\(graph\_validation\)
Callgraph\_validationonce\. Useoverlap\_countper branch to identify the strongest building block \(semantic anchor\) and the weakest branch \(structural redesign target\)\. Count sub\-queries to inferentitynumberandrelationnumber\.
Step 2 — Neighborhood Search\(incoming\_edge\_intersection\)
Call once withtop\_k=10\. Use topflat\_candidatesto identify the best semantic \(relation, entity\) pair\. Usetwo\_hop\_candidatesto infer a good structural pattern \(e\.g\.2pif top two\-hop has high Jaccard\)\.
Step 3 — Return Condition Analysis
Return a JSON object with three keys viafinal\_answer\(\):•structural: at least one ofentitynumber\(int 1–3\),relationnumber\(int 1–3\),pattern\(i/u/n/p/etokens\)•semantic: at least one ofrelation\(name string\),entity\(name string\)•hybrid: at least one structural keyandat least one semantic keyAnchor conditions to the unconditional hypothesis already shown — reuse relations, entities, and structure already present rather than proposing unrelated constraints\.Figure 7:Unconditional condition generation prompt: the agent analyzes an unconditional hypothesis and produces structural, semantic, and hybrid conditions for subsequent conditional generation\.Prompt 4: LLM\-based Condition Satisfaction JudgeTask:You are evaluating whether a knowledge graph hypothesis satisfies a user condition\.Hypothesis:\[hypothesis natural language description\]
Raw action string:\[hypothesis\_raw\]
Parsed hypothesis properties:•Logic pattern:\[pattern, e\.g\.i p e p e\]•Relations \(\[relationnumber\]\):\[list of relation names\]•Entities \(\[entitynumber\]\):\[list of entity names\]Conversation history\(previous conditions, for context only\):\[list of prior turn conditions\]
Current condition to judge:\[current\_condition\]
Does the hypothesis satisfy the current condition? Use the history only as context to interpret what the current condition means \(e\.g\. “make it more complex” means relative to the previous condition\)\.Output:Reply with ONLY a JSON object:\{"result": true\}or\{"result": false\}, nothing else\.Figure 8:LLM\-based condition satisfaction judge prompt\. Given a generated hypothesis and the user’s current condition \(with conversation history as context\), the judge outputs a binary verdict on whether the hypothesis satisfies the condition\.
## Appendix CCase Study
### C\.1Single Turn
Figure[9](https://arxiv.org/html/2605.31370#A3.F9)illustrates how the RCA Agent improves hypothesis generation in a single\-turn PharmKG case\. Although the initial hypothesis satisfies the parsed structural constraints, i\.e\., containing two entities and two relations, its selected anchor entities only partially explain the observed drugs, leading to low semantic similarity\. This shows that condition satisfaction alone is insufficient: the model must also identify entities that are truly relevant to the observation set\.
The RCA Agent refines the hypothesis by diagnosing unreliable anchors and exploring the knowledge neighborhood\. Simply regenerating under the original condition fails to improve the result, suggesting that repeating the same coarse constraint cannot effectively correct the underlying entity\-selection error\. In contrast, Candidate 2 introduces DB11914, which is identified by neighborhood search as a more informative entity, and replaces the weaker anchor in the initial hypothesis\. This targeted condition update leads to a hypothesis that perfectly matches the observations\. Interestingly, directly combining the best neighborhood candidate with the stronger original anchor does not yield a good result, indicating that effective refinement requires not only retrieving promising entities but also composing them with suitable partners\. Overall, this case demonstrates that RCA improves HypoAgent by converting error diagnosis into actionable condition updates, enabling the generator to revise problematic hypothesis fragments rather than merely regenerating hypotheses from the original instruction\.
### C\.2Multiturn
Fig[10](https://arxiv.org/html/2605.31370#A3.F10)presents a multi\-turn case study on BioKG\. The observations are Reactome pathway or biological reaction entities, as indicated by theR\-HSAidentifiers\. This case demonstrates how HypoAgent progressively adapts the generated hypothesis according to evolving user intents across multiple dialogue turns\.
In Turn 1, the user explicitly requires a hypothesis with three entities and the relationMEMBER\_OF\_COMPLEX\. The Intent Recognition Agent correctly parses these constraints, and the system generates a relatively complex hypothesis involving negation and protein\-protein interaction composition\. Although this hypothesis already achieves high semantic similarity, its logical form is not fully aligned with the user’s later preference for simplicity\. In Turn 2, the user states that the hypothesis is too complex and asks for simpler logic\. The system interprets this implicit instruction as reducing the entity number while preserving the core relationMEMBER\_OF\_COMPLEX, leading to a simpler hypothesis with improved semantic similarity\. This shows that the model can use dialogue history to infer non\-explicit refinement goals rather than relying only on literal condition extraction\.
In Turn 3, the user asks to “know more and explore more”, which implies broader coverage instead of further simplification\. The system accordingly changes the pattern to a two\-way union and generates a disjunctive hypothesis over twoMEMBER\_OF\_COMPLEXanchors\. This final hypothesis perfectly matches the observation set, showing that HypoAgent can flexibly adjust both the structural pattern and anchor entities during multi\-turn interaction\. Overall, this case illustrates that the proposed framework can track evolving user intents, transform them into structured conditions, and refine hypotheses from complex composition to simpler logic and finally to broader exploratory coverage\.
### C\.3Uncondition
Fig\.[11](https://arxiv.org/html/2605.31370#A3.F11)shows an unconditional generation case on DBpedia50\. The observations are football player entities, and most of them are related to professional football transfer records\. Without any user\-provided condition, the initial hypothesis links the observations to the entityList\_of\_Iranian\_football\_transfers\_summer\_2012through the relationcurrentMember, but only covers a small portion of the observation set\.
The RCA Agent then induces several candidate conditions from the initial hypothesis, including structural, semantic, and hybrid constraints\. The structural condition preserves only the logical form, such as the number of entities, the number of relations, and the path pattern\. However, this does not change the key semantic anchor, so the generated hypothesis remains the same as the original one\. Similarly, the hybrid condition still constrains the generation around the original structure and relation, and therefore fails to escape the initial local solution\. In contrast, the semantic condition focuses on the relationcurrentMemberand the transfer\-list entity, allowing the model to search within a semantically related neighborhood\. As a result, the model identifies a closely related transfer\-list entity,List\_of\_Iranian\_football\_transfers\_winter\_2011–12, which better explains the observed football players\.
This case demonstrates the self\-improving ability of HypoAgent under the unconditional setting\. Even without explicit user guidance, RCA can analyze the initial hypothesis, induce useful semantic refinement signals, and guide the generator toward a more suitable hypothesis\. It also shows that semantic condition induction is particularly important in unconditional generation, since structural constraints alone may preserve the original logical form but cannot correct an inaccurate semantic anchor\.
Case Study: Single\-turn Hypothesis Generation with RCAObservations:DB11988, DB15253, DB12489, DB14724, DB13952, DB13953, DB13954, DB12530, DB13956, DB12089, DB15595, DB14762, DB14919User Condition:I want a hypothesis that contains 2 entities and has 2 relations\.Parsed Conditions:entitynumber=2,relationnumber=2
Initial Generation
Hypothesis:H=V?:DDI\(DB00163,V?\)∧DDI\(DB11972,V?\)H=V\_\{?\}:\\textit\{DDI\}\(\\textit\{DB00163\},V\_\{?\}\)\\;\\wedge\\;\\textit\{DDI\}\(\\textit\{DB11972\},V\_\{?\}\)
NL:Drugs that have a drug\-drug interaction with both DB00163 and DB11972\.
Jaccard:0\.231
RCA Agent — Candidate 1\(keep original condition\)
Condition:I want a hypothesis that contains 2 entities and has 2 relations\.
Hypothesis:H=V?:DDI\(DB11972,V?\)∧DDI\(DB06360,V?\)H=V\_\{?\}:\\textit\{DDI\}\(\\textit\{DB11972\},V\_\{?\}\)\\;\\wedge\\;\\textit\{DDI\}\(\\textit\{DB06360\},V\_\{?\}\)
NL:Drugs that have a drug\-drug interaction with both DB11972 and DB06360\.
Jaccard:0\.083
RCA Agent — Candidate 2\(updated condition\)
Condition:I want a hypothesis that contains 2 entities and has 2 relations and includes entity DB11914\.
Analysis:Neighborhood search identifies DB11914 as the top flat candidate \(Jaccard=0\.097 via DDI\)\. Replacing the weaker anchor DB00163 with DB11914 should improve coverage\.
Hypothesis:H=V?:DDI\(DB11914,V?\)∧DDI\(DB01590,V?\)H=V\_\{?\}:\\textit\{DDI\}\(\\textit\{DB11914\},V\_\{?\}\)\\;\\wedge\\;\\textit\{DDI\}\(\\textit\{DB01590\},V\_\{?\}\)
NL:Drugs that have a drug\-drug interaction with both DB11914 and DB01590\.
Jaccard:1\.000
RCA Agent — Candidate 3\(LLM directly generated\)
Analysis:Compose a 2i directly using the best flat candidate DB11914 and the stronger original anchor DB11972, both via DDI\.
Hypothesis:H=V?:DDI\(DB11914,V?\)∧DDI\(DB11972,V?\)H=V\_\{?\}:\\textit\{DDI\}\(\\textit\{DB11914\},V\_\{?\}\)\\;\\wedge\\;\\textit\{DDI\}\(\\textit\{DB11972\},V\_\{?\}\)
NL:Drugs that have a drug\-drug interaction with both DB11914 and DB11972\.
Jaccard:0\.083Figure 9:Single\-turn case study with Root Cause Analysis\.Case Study: Multi\-turn Hypothesis GenerationObservations:R\-HSA\-983412, R\-HSA\-983413, R\-HSA\-983414, R\-HSA\-983416, R\-HSA\-983418, R\-HSA\-983419, R\-HSA\-983420, R\-HSA\-198896, R\-HSA\-198897, R\-HSA\-198904, R\-HSA\-198906, R\-HSA\-8951496, R\-HSA\-1236916, R\-HSA\-1236922, R\-HSA\-1236928, R\-HSA\-167755, R\-HSA\-1236930, R\-HSA\-1236931, R\-HSA\-1236934, R\-HSA\-983114, R\-HSA\-983116, R\-HSA\-983119, R\-HSA\-983120, R\-HSA\-983121, R\-HSA\-199592, R\-HSA\-983127, R\-HSA\-983132, R\-HSA\-983136, R\-HSA\-8863933Turn 1
User Condition:I want a hypothesis that contains 3 entities and includes the relation “MEMBER\_OF\_COMPLEX”\.
Parsed Conditions:relation=MEMBER\_OF\_COMPLEX,entitynumber=3
Best Hypothesis:H=V?:¬MEMBER\_OF\_COMPLEX\(O95971,V?\)∧MEMBER\_OF\_COMPLEX\(PPI\(V?\),P30511\)H=V\_\{?\}:\\neg\\,\\textit\{MEMBER\\\_OF\\\_COMPLEX\}\(\\textit\{O95971\},V\_\{?\}\)\\;\\wedge\\;\\textit\{MEMBER\\\_OF\\\_COMPLEX\}\(\\textit\{PPI\}\(V\_\{?\}\),\\textit\{P30511\}\)
NL:Entities that are*not*members of the complex containing O95971, but are reachable via PPI from P30511 through MEMBER\_OF\_COMPLEX\.
Jaccard:0\.800Turn 2
User Condition:This is too complex\. I want to make the logic simpler\.
Parsed Conditions:relation=MEMBER\_OF\_COMPLEX,entitynumber=1
Best Hypothesis:H=V?:MEMBER\_OF\_COMPLEX\(PPI\(V?\),P01889\)H=V\_\{?\}:\\textit\{MEMBER\\\_OF\\\_COMPLEX\}\(\\textit\{PPI\}\(V\_\{?\}\),\\textit\{P01889\}\)
NL:Entities reachable via PPI from P01889 through MEMBER\_OF\_COMPLEX\.
Jaccard:0\.829Turn 3
User Condition:I want to know more and explore more\.
Parsed Conditions:pattern=2u
Best Hypothesis:H=V?:MEMBER\_OF\_COMPLEX\(P01893,V?\)∨MEMBER\_OF\_COMPLEX\(P10321,V?\)H=V\_\{?\}:\\textit\{MEMBER\\\_OF\\\_COMPLEX\}\(\\textit\{P01893\},V\_\{?\}\)\\;\\vee\\;\\textit\{MEMBER\\\_OF\\\_COMPLEX\}\(\\textit\{P10321\},V\_\{?\}\)
NL:Entities that are members of the complex containing P01893*or*P10321\.
Jaccard:1\.000Figure 10:Multi\-turn case study\.Case Study: Unconditional Generation with Condition InductionObservations:Mohammad\_Mansouri, Giorgi\_Krasovski, Yaroslav\_Krushelnitskiy, Nenad\_Brnović, Mohsen\_HamidiUnconditional Hypothesis
Hypothesis:H=V?:currentMember\(List\_of\_Iranian\_football\_transfers\_summer\_2012,V?\)H=V\_\{?\}:\\textit\{currentMember\}\(\\textit\{List\\\_of\\\_Iranian\\\_football\\\_transfers\\\_summer\\\_2012\},V\_\{?\}\)
NL:Entities that are current members of the Iranian football transfers list \(summer 2012\)\.
Jaccard:0\.208
Candidate 1 — Unconditional
Conditions:\(none\)
Hypothesis:H=V?:currentMember\(List\_of\_Iranian\_football\_transfers\_summer\_2012,V?\)H=V\_\{?\}:\\textit\{currentMember\}\(\\textit\{List\\\_of\\\_Iranian\\\_football\\\_transfers\\\_summer\\\_2012\},V\_\{?\}\)
NL:Entities that are current members of the Iranian football transfers list \(summer 2012\)\.
Jaccard:0\.208
Candidate 2 — Structural
Conditions:entitynumber=1,relationnumber=1,pattern=p e
Hypothesis:H=V?:currentMember\(List\_of\_Iranian\_football\_transfers\_summer\_2012,V?\)H=V\_\{?\}:\\textit\{currentMember\}\(\\textit\{List\\\_of\\\_Iranian\\\_football\\\_transfers\\\_summer\\\_2012\},V\_\{?\}\)
NL:Entities that are current members of the Iranian football transfers list \(summer 2012\)\.
Jaccard:0\.208
Candidate 3 — Semantic
Conditions:relation=currentMember,
entity=List\_of\_Iranian\_football\_transfers\_summer\_2012
Hypothesis:H=V?:currentMember\(List\_of\_Iranian\_football\_transfers\_winter\_2011–12,V?\)H=V\_\{?\}:\\textit\{currentMember\}\(\\textit\{List\\\_of\\\_Iranian\\\_football\\\_transfers\\\_winter\\\_2011\\text\{\-\-\}12\},V\_\{?\}\)
NL:Entities that are current members of the Iranian football transfers list \(winter 2011–12\)\.
Jaccard:0\.833
Candidate 4 — Hybrid
Conditions:entitynumber=1,relation=currentMember,pattern=p e
Hypothesis:H=V?:currentMember\(List\_of\_Iranian\_football\_transfers\_summer\_2012,V?\)H=V\_\{?\}:\\textit\{currentMember\}\(\\textit\{List\\\_of\\\_Iranian\\\_football\\\_transfers\\\_summer\\\_2012\},V\_\{?\}\)
NL:Entities that are current members of the Iranian football transfers list \(summer 2012\)\.
Jaccard:0\.208Figure 11:Unconditional generation case study\.Similar Articles
Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version)
This extended paper revisits Semantic Web Services insights for Knowledge Graphs, proposing a four-dimensional formal framework and an Agentic Affordance Profile (AAP) to enable principled KG selection, composition, and failure diagnosis at agent planning time.
@akshay_pachaar: https://x.com/akshay_pachaar/status/2058976178908885210
Explains how to fix agent memory by defining an ontology using Pydantic schemas, enabling structured extraction into knowledge graphs for multi-hop reasoning, with an open-source solution (Zep).
Most “agentic AI” conversations feel too abstract. Here is how my agentic research system looks like
The author shares a practical breakdown of an agentic research system they built to identify and evaluate AI use cases within companies. The system uses six agents for discovery, evaluation, and context extraction, emphasizing human-in-the-loop decision-making over full autonomy.
IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation
This paper presents IdeaForge, a multi-agent framework that leverages knowledge graphs to analyze innovations across methodologies and generate patent claims.
UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents
UI-KOBE proposes a framework that enhances lightweight mobile GUI agents by constructing and leveraging app-specific knowledge graphs to improve task planning and execution efficiency.