GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
Summary
This paper introduces GraphDC, a divide-and-conquer multi-agent framework that decomposes graph algorithmic tasks into subgraphs for specialized agents, improving scalability and reasoning performance on complex graph structures.
View Cached Full Text
Cached at: 05/11/26, 07:04 AM
# GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
Source: [https://arxiv.org/html/2605.06671](https://arxiv.org/html/2605.06671)
Jiaming Cui∗
###### Abstract
Large Language Models \(LLMs\) have demonstrated strong potential for many mathematical problems\. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi\-step reasoning, especially on larger graphs\. Motivated by this gap, we proposeGraphDC, a Divide\-and\-Conquer multi\-agent framework for scalable graph algorithm reasoning\. Specifically, inspired by Divide\-and\-Conquer design,GraphDCdecomposes an input graph into smaller subgraphs, assigns each subgraph to a specialized agent for local reasoning, and uses a master agent to integrate the local outputs with inter\-subgraph information to produce the final solution\. This hierarchical design reduces the reasoning burden on individual agents, alleviates computational bottlenecks, and improves robustness on large graph instances\. Extensive experiments show thatGraphDCconsistently outperforms existing methods on graph algorithm reasoning across diverse tasks and scales, especially on larger instances where direct end\-to\-end reasoning is less reliable\.
## 1Introduction
Graph is a fundamental mathematical concept for representing relationships between entities\[[26](https://arxiv.org/html/2605.06671#bib.bib25),[14](https://arxiv.org/html/2605.06671#bib.bib26),[1](https://arxiv.org/html/2605.06671#bib.bib36),[9](https://arxiv.org/html/2605.06671#bib.bib37),[8](https://arxiv.org/html/2605.06671#bib.bib35)\]\. Many real\-world problems such as infectious disease modeling\[[10](https://arxiv.org/html/2605.06671#bib.bib38),[35](https://arxiv.org/html/2605.06671#bib.bib39),[7](https://arxiv.org/html/2605.06671#bib.bib40)\], supply chain optimization\[[42](https://arxiv.org/html/2605.06671#bib.bib43)\], and urban infrastructure surveillance\[[31](https://arxiv.org/html/2605.06671#bib.bib42),[30](https://arxiv.org/html/2605.06671#bib.bib41)\]can be naturally modeled as graphs, where the objective is to reason over structural dependencies among entities rather than treat instances independently\. Therefore, graph algorithm reasoning has become central to numerous domains, including social network analysis\[[3](https://arxiv.org/html/2605.06671#bib.bib8),[18](https://arxiv.org/html/2605.06671#bib.bib9)\], bioinformatics\[[44](https://arxiv.org/html/2605.06671#bib.bib5)\], and knowledge graph inference\[[46](https://arxiv.org/html/2605.06671#bib.bib10),[24](https://arxiv.org/html/2605.06671#bib.bib11),[39](https://arxiv.org/html/2605.06671#bib.bib12)\]\. Over the years, a wide range of specialized architectures have been proposed\[[21](https://arxiv.org/html/2605.06671#bib.bib27),[29](https://arxiv.org/html/2605.06671#bib.bib28)\]to tackle these tasks\. While these models achieve strong performance, they often suffer from limited generalization and usability\[[43](https://arxiv.org/html/2605.06671#bib.bib29),[41](https://arxiv.org/html/2605.06671#bib.bib30)\]: achieving state\-of\-the\-art results typically requires task\-specific designs such as tailored preprocessing pipelines and decoders, which make them less flexible and harder to adapt\. This limits their applicability when task formulations, graph characteristics, or supervision settings vary across datasets and applications\.
Recently, large language models \(LLMs\) have shown impressive capabilities in natural language interaction, interpretability, and generalization across diverse mathematical tasks\[[15](https://arxiv.org/html/2605.06671#bib.bib13),[45](https://arxiv.org/html/2605.06671#bib.bib14),[13](https://arxiv.org/html/2605.06671#bib.bib15),[22](https://arxiv.org/html/2605.06671#bib.bib16),[4](https://arxiv.org/html/2605.06671#bib.bib17),[23](https://arxiv.org/html/2605.06671#bib.bib34),[11](https://arxiv.org/html/2605.06671#bib.bib33)\]\. These properties make LLMs an appealing alternative to specialized graph models, as they offer a unified interface for solving different reasoning tasks with minimal task\-specific redesign\. This has sparked growing interest in their potential for graph algorithm reasoning\. However, the reasoning capacity of a single LLM is inherently constrained: as graph size and density increase, LLM accuracy on graph algorithm reasoning tasks declines sharply\. This limitation is particularly pronounced for algorithmic reasoning, where the model must simultaneously track multi\-hop dependencies, preserve structural consistency, and execute multi\-step logical operations\. As a result, directly applying a single LLM to increasingly large graphs is difficult to scale\.
Figure 1:The framework ofGraphDC\. Given a mathematical problem on graphs,GraphDCsolves it in two steps\. In theGraph\-Query Decompositionstep, a graph splitter first decomposes the graph into several subgraphs\. Each specialized agent then processes one subgraph using a sub\-query generated by the question describer and performs intra\-subgraph algorithm reasoning\. Next, in theHierarchical Reasoningstep, a master agent aggregates the answers from all agents, together with inter\-subgraph information, to respond to the original question\.To mitigate this degradation, several strategies have been explored\[[20](https://arxiv.org/html/2605.06671#bib.bib18),[6](https://arxiv.org/html/2605.06671#bib.bib19),[40](https://arxiv.org/html/2605.06671#bib.bib20),[16](https://arxiv.org/html/2605.06671#bib.bib21)\]\. For example, GITA\[[38](https://arxiv.org/html/2605.06671#bib.bib3)\]augments prompts with graph visualizations, while GraphAgent\-Reasoner\[[17](https://arxiv.org/html/2605.06671#bib.bib2)\]introduces a multi\-agent system where a master agent coordinates a set of smaller agents\. These studies suggest that adding structure to the reasoning process can improve performance beyond vanilla prompting\. However, existing approaches still face substantial scalability challenges\. Although the latter achieves strong accuracy on dense and complex graphs, it requires assigning one agent per node, resulting in severe scalability issues and computational bottlenecks\. Such coordination costs grow quickly with graph size, making the framework expensive for large instances\. Meanwhile, real\-world graphs continue to grow in both scale and complexity, often exceeding the effective reasoning capacity of a single LLM or a naively coordinated multi\-agent system\. Even as LLMs improve, larger and denser graphs will continue to pose new scalability challenges\. This raises two key questions: \(1\) how can we enable effective reasoning over graphs that exceed current scalability limits, and \(2\) how can we obtain reliable answers without incurring prohibitive computational costs?
In this work, we revisit a classic principle of mathematics: divide the work, then integrate the answers back together\. Inspired by distributed graph processing and thedivide\-and\-conquerparadigm\[[28](https://arxiv.org/html/2605.06671#bib.bib23),[19](https://arxiv.org/html/2605.06671#bib.bib24)\], we proposeGraphDC, a multi\-agent system that partitions nodes into subgraphs, assigns a dedicated agent to each subgraph, and employs a master agent to integrate outputs from each subgraph along with inter\-subgraph information to solve the overall task for scalable graph algorithm reasoning\. Specifically, we first cluster nodes into subgraphs and allocate an agent to each subgraph for intra\-subgraph algorithm reasoning\. The master agent then aggregates all outputs, incorporating inter\-subgraph dependencies, to produce the final solution\. Compared with node\-level agent assignment, this design reduces the number of agents and coordination overhead while preserving the structural information needed for global reasoning\. By constraining each agent to a manageable subgraph, the framework improves both scalability and reasoning reliability on complex graph instances\.
We validate this framework through extensive experiments, demonstrating substantial gains in both accuracy and scalability\. Our results show thatGraphDCconsistently outperforms strong baselines, with the largest improvements appearing on larger and denser graphs where single\-agent methods degrade substantially\. These findings indicate that subgraph\-level decomposition is an effective way to scale LLM\-based graph algorithm reasoning and provide a practical direction for handling graph instances beyond the range of current monolithic approaches\.
Our main contributions are summarized as follows:
- •To the best of our knowledge, we are the first to propose a divide\-and\-conquer multi\-agent system \(MAS\) for scalable graph reasoning\. Instead of assigning one agent to each node, our framework decomposes the original graph into subgraphs, enabling multiple agents to collaboratively solve large\-scale graph reasoning problems\.
- •To support this divide\-and\-conquer MAS, we propose an effective graph decomposition algorithm tailored to downstream reasoning tasks\. The proposed decomposition strategy produces subgraphs that are more suitable for later agent\-level reasoning and facilitates the integration of local solutions into a global answer\.
- •Extensive experiments demonstrate the effectiveness and scalability of our framework\. The results show that our method achieves strong performance on challenging graph reasoning tasks, particularly on larger and denser graphs where existing approaches face substantial difficulty\.
The rest of the paper is organized as follows\. Section[2](https://arxiv.org/html/2605.06671#S2)reviews the related work\. Section[3](https://arxiv.org/html/2605.06671#S3)presents theGraphDCframework and describes how it is applied to graph reasoning tasks\. Section[4](https://arxiv.org/html/2605.06671#S4)evaluates the performance ofGraphDC\. Finally, Section[5](https://arxiv.org/html/2605.06671#S5)concludes the paper and discusses directions for future work\.
## 2Related work
We review three lines of research most relevant to our work: classical graph algorithms, machine learning\-based graph reasoning methods, and recent LLM\-based approaches for graph reasoning\.
### 2\.1Classical graph algorithms\.
Classical graph algorithms provide the foundation for reasoning over graph\-structured data\[[34](https://arxiv.org/html/2605.06671#bib.bib46),[12](https://arxiv.org/html/2605.06671#bib.bib49)\]\. A large body of work has studied exact and efficient algorithms for fundamental tasks such as connectivity, shortest paths, cycle detection, and relational inference on graphs\[[32](https://arxiv.org/html/2605.06671#bib.bib48),[33](https://arxiv.org/html/2605.06671#bib.bib47),[25](https://arxiv.org/html/2605.06671#bib.bib50)\]\. These methods are often computationally well grounded and, for many canonical problems, admit strong theoretical guarantees\. However, they are typically designed for structured graph inputs with explicitly defined nodes, edges, and objectives, and are less suited to settings where graph reasoning must be performed jointly with rich contextual information\. In many modern applications, graph instances are accompanied by heterogeneous node or edge attributes, textual descriptions, external knowledge, or natural\-language queries\[[26](https://arxiv.org/html/2605.06671#bib.bib25),[14](https://arxiv.org/html/2605.06671#bib.bib26),[44](https://arxiv.org/html/2605.06671#bib.bib5)\]\. Classical algorithms generally do not provide a native mechanism for incorporating such rich data into the reasoning process\. As a result, while they remain highly effective for well\-specified combinatorial problems, they are less flexible for complex graph reasoning scenarios that require integrating graph structure with unstructured or semantically rich information\.
### 2\.2Machine learning\-based graph reasoning\.
To overcome the rigidity of purely algorithmic methods, a large literature has explored combining graphs with machine learning, especially through graph neural networks, graph transformers, and other graph representation learning frameworks\[[21](https://arxiv.org/html/2605.06671#bib.bib27),[29](https://arxiv.org/html/2605.06671#bib.bib28),[43](https://arxiv.org/html/2605.06671#bib.bib29),[41](https://arxiv.org/html/2605.06671#bib.bib30)\]\. These methods can encode both graph topology and node or edge attributes, and have achieved strong empirical performance across a wide range of applications, including social networks, bioinformatics, and knowledge graphs\[[3](https://arxiv.org/html/2605.06671#bib.bib8),[18](https://arxiv.org/html/2605.06671#bib.bib9),[44](https://arxiv.org/html/2605.06671#bib.bib5),[46](https://arxiv.org/html/2605.06671#bib.bib10),[24](https://arxiv.org/html/2605.06671#bib.bib11),[39](https://arxiv.org/html/2605.06671#bib.bib12)\]\. Compared with traditional algorithms, they are better able to incorporate rich data and learn task\-relevant patterns from supervision\. Nevertheless, most of these approaches remain task\-specific in both design and training\. Their success often depends on carefully chosen architectures, sampling strategies, positional or structural encodings, loss functions, and downstream decoders tailored to a particular task\. Moreover, applying them to a new reasoning problem typically requires retraining, redesign, or both\. This limits their usability in settings where one seeks a more general\-purpose reasoning framework that can transfer across graph tasks without extensive task\-specific engineering\.
### 2\.3LLM\-based graph reasoning\.
More recently, researchers have begun exploring large language models for graph reasoning\. This line of work is motivated by the strong in\-context learning, interpretability, and task generalization capabilities of LLMs across diverse reasoning problems\[[15](https://arxiv.org/html/2605.06671#bib.bib13),[45](https://arxiv.org/html/2605.06671#bib.bib14),[13](https://arxiv.org/html/2605.06671#bib.bib15),[22](https://arxiv.org/html/2605.06671#bib.bib16),[4](https://arxiv.org/html/2605.06671#bib.bib17),[23](https://arxiv.org/html/2605.06671#bib.bib34),[11](https://arxiv.org/html/2605.06671#bib.bib33)\]\. Existing efforts include prompting\-based methods that serialize graphs into text, multimodal methods that augment graph reasoning with visual representations such as GITA\[[38](https://arxiv.org/html/2605.06671#bib.bib3)\], and multi\-agent frameworks that distribute reasoning across agents\[[17](https://arxiv.org/html/2605.06671#bib.bib2)\]\. Compared with classical graph algorithms and learned graph encoders, these methods offer a more unified interface for reasoning over graph problems without task\-specific training\. However, scalability remains a major bottleneck\. For single\-agent LLM methods, reasoning quality degrades rapidly as graph size and density increase, since the model must track more structural dependencies and longer reasoning chains within a limited context and reasoning budget\. Multi\-agent methods partially alleviate this issue, but existing designs can still incur substantial coordination overhead; for example, assigning one agent per node is computationally expensive and becomes impractical on large graphs\[[17](https://arxiv.org/html/2605.06671#bib.bib2)\]\. In contrast, our work focuses on scalable graph algorithm reasoning through subgraph\-level divide\-and\-conquer, which reduces coordination cost while preserving the global information needed for final reasoning\.
## 3Proposed Method
To overcome the limitations discussed in Section[1](https://arxiv.org/html/2605.06671#S1)and Section[2](https://arxiv.org/html/2605.06671#S2)and better leverage the capabilities of LLMs for graph reasoning, we propose a novel multi\-agent collaboration framework,GraphDC, as illustrated in Figure[1](https://arxiv.org/html/2605.06671#S1.F1)\. The overall design follows a simple but effective intuition: instead of forcing a single LLM to reason over the entire graph at once, we decompose the original problem into smaller subproblems that can be handled by multiple agents, and then coordinate their outputs to recover the final answer\. In this way,GraphDCreduces the reasoning burden on each individual agent while preserving the global information needed for solving the original task\.
Specifically, the framework consists of two stages:Graph\-Query DecompositionandHierarchical Reasoning\. In the first stage, the input graph and query are decomposed into a collection of manageable subgraphs together with corresponding sub\-queries\. In the second stage, sub\-agents perform local reasoning within their assigned subgraphs, and a master agent integrates their outputs along with inter\-subgraph dependencies to produce the final answer\. This two\-stage design addresses the challenges of complexity, efficiency, and accuracy in a unified and scalable manner\.
### 3\.1Graph\-Query Decomposition
We first describe how the original graph reasoning problem is decomposed\. In a standard graph reasoning setting, the model takes as input \(i\) a graphG=\(V,E\)G=\(V,E\), whereVVandEEdenote the sets of vertices and edges, respectively, and \(ii\) a task specification𝒬\\mathcal\{Q\}, which defines the target reasoning objective\. Depending on the application,𝒬\\mathcal\{Q\}may correspond to a decision problem, a structural query, or a graph algorithmic reasoning task that requires combining local and global graph information\.
Given a graph\-query pair\(G,𝒬\)\(G,\\mathcal\{Q\}\), our goal is to partitionGGinto compact and appropriately sized subgraphs so that each subproblem remains within the reasoning capacity of an individual LLM agent\. At the same time, the decomposition should preserve sufficient structural information to support the reconstruction of a global solution\. Formally, we apply a graph splitter𝒮\\mathcal\{S\}to partitionGGintonnsubgraphs:
\(3\.1\)𝒮\(G\)=\{g\(i\)\}i=1n\.\\mathcal\{S\}\(G\)=\\\{g^\{\(i\)\}\\\}\_\{i=1\}^\{n\}\.where each subgraphg\(i\)=\(v\(i\),e\(i\)\)g^\{\(i\)\}=\(v^\{\(i\)\},e^\{\(i\)\}\)contains a subset of internal nodesv\(i\)v^\{\(i\)\}and internal edgese\(i\)e^\{\(i\)\}\. In addition to the internal structure of each subgraph, we explicitly identify its*exit nodes*, denoted bygexit\(i\)g^\{\(i\)\}\_\{\\text\{exit\}\}, namely, the nodes that connect to nodes in other subgraphs\. These exit nodes play an important role in preserving cross\-subgraph dependencies, since they act as the interface between local reasoning and global integration\.
Table 1:List of notationsSpecifically, to avoid introducing bias from complicated partitioning methods and to keep the decomposition procedure general, we adopt a simple partitioning strategy as the splitter𝒮\\mathcal\{S\}\[[27](https://arxiv.org/html/2605.06671#bib.bib51)\]\. As shown later in the experiments, even such a lightweight decomposition strategy can provide substantial benefits when combined with hierarchical multi\-agent reasoning\. Besides, the splitter is actually method\-agnostic to the specific strategy, and therefore can also be extended to more sophisticated decomposition strategies, which may further improve performance\.
After the graph is partitioned, a question describer generates a task\-aware sub\-query for each subgraph\. Specifically, for eachg\(i\)g^\{\(i\)\}, the describer conditions on the exit nodes of the subgraph, the global task specification, and a prompt template for the reasoning task, producing a sub\-queryq\(i\)q^\{\(i\)\}:
\(3\.2\)q\(i\)=𝒟\(gexit\(i\),𝒫;𝒬\)q^\{\(i\)\}=\\mathcal\{D\}\(g^\{\(i\)\}\_\{\\text\{exit\}\},\\mathcal\{P\};\\mathcal\{Q\}\)where𝒟\\mathcal\{D\}denotes the describer function and𝒫\\mathcal\{P\}is the prompt template\. The purpose of the describer is to translate the original task𝒬\\mathcal\{Q\}into a localized reasoning objective for each subgraph, while still retaining awareness of how the local result may contribute to the final global answer\. In other words, the sub\-query is not simply a truncated version of the original question; it is a task\-conditioned local reasoning instruction that accounts for the role of the subgraph in the broader graph structure\.
This decomposition process produces a set of subgraph–subquery pairs\(g\(i\),q\(i\)\)\(g^\{\(i\)\},q^\{\(i\)\}\)together with the set of inter\-subgraph edgesEinterE\_\{\\text\{inter\}\}\. Their relationship to the original graph can be written as
\(3\.3\)G=∑ig\(i\)\+Einter,G=\\sum\_\{i\}g^\{\(i\)\}\+E\_\{\\text\{inter\}\},where∑ig\(i\)\\sum\_\{i\}g^\{\(i\)\}denotes the collection of all subgraphs andEinterE\_\{\\text\{inter\}\}captures the structural dependencies across them\. The resulting decomposition, i\.e\., the set of local reasoning units\(g\(i\),q\(i\)\)\(g^\{\(i\)\},q^\{\(i\)\}\)and the inter\-subgraph connectivity informationEinterE\_\{\\text\{inter\}\}, is then passed to the next stage for hierarchical reasoning\.
### 3\.2Hierarchical Reasoning
Given the decomposition results,GraphDCnext performs reasoning in a hierarchical manner\. This stage consists of two steps: \(1\)*intra\-subgraph local reasoning*by sub\-agents, and \(2\)*inter\-subgraph global synthesis*by the master agent\. Specifically, the intuition is that local agents focus on solving bounded reasoning problems within subgraphs, while the master agent focuses on combining local evidence into a globally consistent solution\.
#### 3\.2\.1Intra\-subgraph reasoning\.
Each sub\-agent is assigned one subgraph\-subquery pair\(g\(i\),q\(i\)\)\(g^\{\(i\)\},q^\{\(i\)\}\)\. The agent then applies the LLM reasonerℛ\\mathcal\{R\}to generate a natural\-language response:
\(3\.4\)a\(i\)=ℛ\(g\(i\),q\(i\)\)\.a^\{\(i\)\}=\\mathcal\{R\}\(g^\{\(i\)\},q^\{\(i\)\}\)\.
This local reasoning step allows each agent to concentrate on a restricted portion of the graph, thereby reducing the effective reasoning complexity\. Since the input size is smaller and the structural scope is more limited, the agent can devote more capacity to the actual reasoning process rather than spending its context budget on representing an overly large graph\.
However, the raw outputa\(i\)a^\{\(i\)\}of an LLM is often verbose and may include intermediate reasoning steps, explanatory text, or redundant details that are unnecessary for downstream aggregation\. Passing these full responses directly to the master agent would increase the integration burden and may introduce noise\. To address this issue, we employ an Extractorℰ\\mathcal\{E\}that distills each response into a concise sub\-answer, denoted byaextract\(i\)a^\{\(i\)\}\_\{\\text\{extract\}\}\. The extractor retains only the essential information needed for final reasoning, such as local conclusions, critical structural facts, or task\-relevant summaries\. This refinement step improves communication efficiency between agents and helps the master agent focus on the information that matters most\.
#### 3\.2\.2Inter\-subgraph synthesis\.
With the sub\-answers from sub\-agents, the master agent performs global synthesis\. Specifically, it aggregates the set of extracted sub\-answers\{aextract\(i\)\}\\\{a^\{\(i\)\}\_\{\\text\{extract\}\}\\\}together with the inter\-subgraph informationEinterE\_\{\\text\{inter\}\}, and then uses the same reasoning moduleℛ\\mathcal\{R\}to generate the final answer𝒜\\mathcal\{A\}to the original query𝒬\\mathcal\{Q\}:
\(3\.5\)𝒜=ℛ\(∑iaextract\(i\),Einter;𝒬\)\.\\mathcal\{A\}=\\mathcal\{R\}\(\\sum\_\{i\}a\_\{\\text\{extract\}\}^\{\(i\)\},E\_\{\\text\{inter\}\};\\mathcal\{Q\}\)\.
Note that the role of the master agent is not to redo all reasoning from scratch, but to integrate local evidence into a coherent global solution\. It resolves dependencies that span multiple subgraphs, reconciles potentially complementary local observations, and determines how inter\-subgraph edges affect the final answer\. This design allows the overall system to preserve a global reasoning view without requiring any single agent to directly process the full input graph\.
Overall, the proposed hierarchical reasoning framework transforms a difficult monolithic graph reasoning problem into a coordinated multi\-agent process with clear functional roles\. Sub\-agents are responsible for localized inference, while the master agent is responsible for structured global integration\. As we will show later in Section[4](https://arxiv.org/html/2605.06671#S4), by decomposing both the graph and the reasoning process,GraphDCimproves scalability while maintaining the ability to solve graph reasoning tasks that require global consistency\.
We summarize the overall procedure ofGraphDCin Algorithm[1](https://arxiv.org/html/2605.06671#alg1)\. Lines 1\-5 correspond to theGraph\-Query Decompositionstage, where the input graph is partitioned into subgraphs, the inter\-subgraph connections are identified, and a task\-aware sub\-query is generated for each subgraph\. Lines 6\-12 implement theHierarchical Reasoningstage\. In this stage, each sub\-agent performs local reasoning on its assigned subgraph, an extractor distills the local response into a concise sub\-answer, and a master agent integrates all extracted sub\-answers with inter\-subgraph information to produce the final answer to the original query\.
Algorithm 1TheGraphDCFramework0:Graph\-query pair
\(G,𝒬\)\(G,\\mathcal\{Q\}\), prompt template
𝒫\\mathcal\{P\}, graph splitter
𝒮\\mathcal\{S\}, question describer
𝒟\\mathcal\{D\}, LLM reasoner
ℛ\\mathcal\{R\}, and extractor
ℰ\\mathcal\{E\}
0:Final answer
𝒜\\mathcal\{A\}
1:Partition the input graph into subgraphs:
𝒮\(G\)=\{g\(i\)\}i=1n\\mathcal\{S\}\(G\)=\\\{g^\{\(i\)\}\\\}\_\{i=1\}^\{n\}
2:Identify the inter\-subgraph edges
EinterE\_\{\\text\{inter\}\}
3:for
i=1i=1to
nndo
4:Generate the task\-aware sub\-query:
q\(i\)=𝒟\(gexit\(i\),𝒫;𝒬\)q^\{\(i\)\}=\\mathcal\{D\}\(g^\{\(i\)\}\_\{\\text\{exit\}\},\\mathcal\{P\};\\mathcal\{Q\}\)
5:endfor
6:for
i=1i=1to
nndo
7:Perform intra\-subgraph reasoning:
a\(i\)=ℛ\(g\(i\),q\(i\)\)a^\{\(i\)\}=\\mathcal\{R\}\(g^\{\(i\)\},q^\{\(i\)\}\)
8:Extract the concise sub\-answer:
aextract\(i\)=ℰ\(a\(i\)\)a\_\{\\text\{extract\}\}^\{\(i\)\}=\\mathcal\{E\}\(a^\{\(i\)\}\)
9:endfor
10:Perform inter\-subgraph synthesis:
11:
𝒜=ℛ\(∑iaextract\(i\),Einter;𝒬\)\\mathcal\{A\}=\\mathcal\{R\}\\\!\\left\(\\sum\_\{i\}a\_\{\\text\{extract\}\}^\{\(i\)\},E\_\{\\text\{inter\}\};\\mathcal\{Q\}\\right\)
12:Return
𝒜\\mathcal\{A\}
## 4Experiments
Table 2:Dataset statisticsIn this section, we will answer the following research questions:
- •Question 1:DoesGraphDCoutperform existing LLM\-based baselines on graph algorithm reasoning tasks?
- •Question 2:How do existing methods behave as graph size increases?
- •Question 3:DoesGraphDCprovide larger gains on more challenging and larger graphs?
### 4\.1Setup
We compared our method with existing methods on two datasets: one dataset is from the NLGraph\[[36](https://arxiv.org/html/2605.06671#bib.bib4)\]paper, and another one was constructed by us that still follows the same graph generation procedure as NLGraph but increases the size of the graphs\.
The reason we create and evaluate on a new dataset constructed by us is that existing datasets for graph algorithm reasoning mostly focus on relatively small graphs, which are easier for current LLM\-based methods to process and reason over\[[36](https://arxiv.org/html/2605.06671#bib.bib4),[5](https://arxiv.org/html/2605.06671#bib.bib1)\]\. In contrast, larger graphs with more nodes and denser structural dependencies are substantially more challenging, yet remain underrepresented in current benchmarks\. This imbalance makes it difficult to systematically evaluate how graph reasoning methods scale as graph size increases\.
Specifically, the dataset we constructed contains an equal number of graphs in each size range: 0\-20, 20\-40, 40\-60, 60\-80, and 80\-100 nodes\. This design allows us to study model behavior under progressively more difficult graph reasoning settings instead of concentrating most evaluations on small graphs\. In addition, for the shortest path and connectivity tasks, we deliberately select target node pairs that are relatively far apart\. This makes the reasoning process more demanding, since the model must track longer\-range graph dependencies and perform more nontrivial multi\-step reasoning\. As a result, task difficulty naturally increases with graph size, making the benchmark more suitable for assessing scalability\.
Table[2](https://arxiv.org/html/2605.06671#S4.T2)summarizes the statistics of the two datasets\. Compared with NLGraph\[[36](https://arxiv.org/html/2605.06671#bib.bib4)\], our benchmark contains substantially more graphs, nodes, and edges, and provides a more challenging testbed for evaluating graph algorithm reasoning methods, especially in large\-graph settings where scalability becomes critical\.
### 4\.2Experiment Results
We first evaluate on the original NLGraph dataset as in Table[3](https://arxiv.org/html/2605.06671#S4.T3), following the experimental setting of NLGraph\[[36](https://arxiv.org/html/2605.06671#bib.bib4)\]\. We compare against the results reported in the NLGraph for Few\-Shot\[[2](https://arxiv.org/html/2605.06671#bib.bib32)\]and Chain\-of\-Thought \(CoT\)\[[37](https://arxiv.org/html/2605.06671#bib.bib31)\]baselines\. We also evaluateGraphDCon the dataset we constructed as described in subsection[4\.1](https://arxiv.org/html/2605.06671#S4.SS1)and compare it with the same representative LLM\-based baselines as in Table[4](https://arxiv.org/html/2605.06671#S4.T4), also using Few\-Shot\[[2](https://arxiv.org/html/2605.06671#bib.bib32)\]and Chain\-of\-Thought \(CoT\)\[[37](https://arxiv.org/html/2605.06671#bib.bib31)\]baselines following the experimental setting of NLGraph\[[36](https://arxiv.org/html/2605.06671#bib.bib4)\]\. All methods are implemented with GPT\-4\.1\-mini without any fine\-tuning\. For NLGraph dataset as in Table[3](https://arxiv.org/html/2605.06671#S4.T3), we report accuracy across three difficulty levels: easy, medium, and hard following their original paper\. For our dataset as in Table[4](https://arxiv.org/html/2605.06671#S4.T4), we report accuracy across five graph size ranges to better examine overall performance and how each method behaves as graph size increases\. Code and dataset are available athttps://anonymous\.4open\.science/r/GraphDCA\-FBDF\.
Table 3:Accuracy on the NLGraph dataset, best results are bold\.∗Not provided in their original paper
#### 4\.2\.1Q1:GraphDCOutperforms Existing Baselines\.
As shown in Table[3](https://arxiv.org/html/2605.06671#S4.T3)and Table[4](https://arxiv.org/html/2605.06671#S4.T4),GraphDCconsistently achieves higher accuracy than all baselines across nearly all tasks and graph size groups\. This confirms the effectiveness of our divide\-and\-conquer multi\-agent design, which distributes the reasoning load across specialized agents and reduces the burden on any single LLM\. Specifically, we noticed that the performance improvements ofGraphDCcompared with baselines are larger on our datasets in Table[4](https://arxiv.org/html/2605.06671#S4.T4), especially on larger graphs with sizes between 80 and 100; this demonstrates the advantage of our method on large graph reasoning compared with existing methods\. For small graphs, single\-agent methods remain competitive on connectivity and shortest\-path tasks, and in some cases slightly outperformGraphDCon connectivity, likely because decomposition is unnecessary in such simple settings and may occasionally produce less informative subgraphs\. However, as graph size increases, the performance of single\-agent methods deteriorates quickly, particularly on cycle detection and shortest\-path tasks, whereasGraphDCremains substantially more robust\. These results indicate that decomposing the graph into manageable subproblems is effective for preserving reasoning quality in more complex settings\.
#### 4\.2\.2Q2: Existing Methods Struggle on Large Graphs\.
As shown in Table[3](https://arxiv.org/html/2605.06671#S4.T3)and Table[4](https://arxiv.org/html/2605.06671#S4.T4), existing methods have limited reasoning ability on larger graphs\. Their accuracy drops sharply as graph size increases, and on harder tasks they often approach failure\. Specifically, existing methods demonstrate the worst performance on our dataset with graph sizes larger than 80, as shown in Table[4](https://arxiv.org/html/2605.06671#S4.T4), further demonstrating their struggle on large graphs\. This trend is especially clear for shortest\-path and cycle detection, both of which require multi\-step reasoning over long\-range structural dependencies\. For example, in cycle detection with graphs larger than 40 nodes, the baselines often default to answering “yes” for most queries, yielding accuracy close to 50% on the balanced test set\. This suggests that standard prompting and CoT prompting are insufficient for reliable reasoning once graph complexity exceeds the effective capacity of a single model\. In contrast,GraphDCuses graph decomposition and coordinated subgraph reasoning to better manage this complexity, maintaining consistently higher accuracy even in challenging large\-graph settings\.
#### 4\.2\.3Q3:GraphDCPerforms Better on Large Graphs\.
Table[3](https://arxiv.org/html/2605.06671#S4.T3)and Table[4](https://arxiv.org/html/2605.06671#S4.T4)also report the relative improvement ofGraphDCover single\-agent baselines\. We observe that the gains become larger as the task becomes more difficult, indicating that our framework is particularly effective in challenging reasoning scenarios\. In particular, for hard graphs in the NLGraph dataset, as in Table[3](https://arxiv.org/html/2605.06671#S4.T3), and graphs with more than 60 nodes in our dataset, as in Table[4](https://arxiv.org/html/2605.06671#S4.T4),GraphDCachieves nearly twofold accuracy gains on the shortest\-path task, highlighting its advantage on problems that require deeper and more global reasoning\. Even on connectivity, where all methods perform strongly on small graphs, the benefit ofGraphDCbecomes increasingly visible as graph size grows\. Overall, these results demonstrate that the proposed framework scales more effectively than existing baselines and is better suited for graph reasoning on large and complex instances\.
Table 4:Accuracy on our dataset, best results are bold\.
### 4\.3Case study
We present a case study to illustrate howGraphDCimproves connectivity reasoning on larger graphs\. Consider a graph with 100 nodes that can be naturally divided into two clusters, denoted by cluster 1 and 2\. Most edges lie within each cluster, while only a few sparse edges connect the two clusters\. Suppose the task is to determine whether a node in cluster 1 \(specifically, the node 27\) is connected to another in cluster 2 \(node 97\)\.
When the entire graph is given to a single LLM agent, the model must reason over a long path spanning two clusters and identify the few critical cross\-cluster edges that connect them\. This is difficult because the relevant evidence is distributed across distant parts of the graph: the model must first verify that if the node 27 can reach some exit node in cluster 1, then determine whether that exit node is linked to an exit node in cluster 2, and finally check whether the node 97 is reachable from that entry point inside cluster 2\. In practice, a single agent may miss one of these steps or focus excessively on local structure, leading to an incorrect prediction of disconnection\.
In contrast,GraphDCfirst decomposes the graph into two clusters and assigns a sub\-agent to each cluster\. The sub\-agent for cluster 1 only needs to determine whether the node 27 can reach an exit node of cluster 1, while the sub\-agent for cluster 2 analogously checks whether an exit node of cluster 2 can reach the node 97\. Each subproblem is substantially simpler than reasoning over the entire graph, since it is restricted to a smaller subgraph with fewer irrelevant nodes and edges\.
Figure 2:Case study of connectivity reasoning withGraphDC\. A single\-agent approach must reason over the entire graph to determine whether node 27 and node 97 are connected, requiring it to track a long\-range path across multiple clusters\. In contrast,GraphDCdecomposes the graph into two clusters and assigns one sub\-agent to each cluster for local reachability reasoning\. The master agent then integrates the local results with the sparse inter\-cluster edges to infer the final global connectivity\.The master agent then combines the outputs of the two sub\-agents corresponding to the two clusters with the sparse inter\-cluster edges\. If an exit node reachable from the node 27 in cluster 1 is connected via a cross\-cluster edge to an exit node that can reach node 97 in cluster 2, the master agent concludes that the node 27 and the node 97 are connected in the original graph through the exit nodes in both clusters\. In this way,GraphDCrecovers the global connectivity result by composing two local reachability decisions with a small amount of cross\-cluster information\.
This example illustrates the core advantage of our framework\. Rather than forcing a single LLM to track a long\-range dependency across the entire graph,GraphDCdecomposes the task into two local reasoning steps followed by one lightweight global aggregation step\. This divide\-and\-conquer process makes the reasoning problem more tractable and improves the reliability of the final connectivity prediction\.
## 5Conclusion
In this work, we proposedGraphDC, a divide\-and\-conquer multi\-agent framework for scalable graph algorithm reasoning\. Instead of relying on a single model to process the entire graph, our approach decomposes the input graph into manageable subgraphs, assigns each subgraph to an individual agent for local reasoning, and then uses a master agent to integrate the local outputs and inter\-subgraph information to answer the original question\. In this way,GraphDCreduces the reasoning burden on each agent while preserving the global structural information required for solving the task\. Experimental results demonstrate thatGraphDCconsistently outperforms existing methods on large\-graph algorithm reasoning tasks, showing clear advantages especially as graph size and task complexity increase\. These findings highlight both the effectiveness and the scalability of our framework, and suggest that structured multi\-agent collaboration is a promising direction for graph reasoning with LLMs\. In future work, it would be interesting to explore more advanced graph decomposition strategies and extend the framework to a broader range of graph reasoning tasks\.
## References
- \[1\]V\. Anand, J\. Cui, J\. Heavey, A\. Vullikanti, and B\. A\. Prakash\(2024\)H2ABM: heterogeneous agent\-based model on hypergraphs to capture group interactions\.InProceedings of the 2024 SIAM International Conference on Data Mining \(SDM\),pp\. 280–288\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[2\]T\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. D\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell,et al\.\(2020\)Language models are few\-shot learners\.Advances in neural information processing systems33,pp\. 1877–1901\.Cited by:[§4\.2](https://arxiv.org/html/2605.06671#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2605.06671#S4.T3.10.10.13.3.2),[Table 3](https://arxiv.org/html/2605.06671#S4.T3.10.10.15.5.2),[Table 3](https://arxiv.org/html/2605.06671#S4.T3.4.4.4.3),[Table 4](https://arxiv.org/html/2605.06671#S4.T4.13.13.16.1.2),[Table 4](https://arxiv.org/html/2605.06671#S4.T4.13.13.18.3.2),[Table 4](https://arxiv.org/html/2605.06671#S4.T4.13.13.20.5.2)\.
- \[3\]Q\. Cao, H\. Shen, J\. Gao, B\. Wei, and X\. Cheng\(2020\)Popularity prediction on social platforms with coupled graph neural networks\.InProceedings of the 13th international conference on web search and data mining,pp\. 70–78\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[4\]Z\. Chai, T\. Zhang, L\. Wu, K\. Han, X\. Hu, X\. Huang, and Y\. Yang\(2023\)Graphllm: boosting graph reasoning ability of large language model\.arXiv preprint arXiv:2310\.05845\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[5\]N\. Chen, Y\. Li, J\. Tang, and J\. Li\(2024\)Graphwiz: an instruction\-following language model for graph computational problems\.InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 353–364\.Cited by:[§A](https://arxiv.org/html/2605.06671#S1a.p1.1),[§4\.1](https://arxiv.org/html/2605.06671#S4.SS1.p2.1)\.
- \[6\]Z\. Chen, H\. Mao, H\. Li, W\. Jin, H\. Wen, X\. Wei, S\. Wang, D\. Yin, W\. Fan, H\. Liu,et al\.\(2024\)Exploring the potential of large language models \(llms\) in learning on graphs\.ACM SIGKDD Explorations Newsletter25\(2\),pp\. 42–61\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p3.1)\.
- \[7\]A\. Chopra, A\. Rodríguez, J\. Subramanian, A\. Quera\-Bofarull, B\. Krishnamurthy, B\. A\. Prakash, and R\. Raskar\(2023\)Differentiable agent\-based epidemiology\.InProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems,pp\. 1848–1857\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[8\]J\. Cui, J\. Heavey, E\. Klein, G\. R\. Madden, C\. D\. Sifri, A\. Vullikanti, and B\. A\. Prakash\(2025\)Identifying and forecasting importation and asymptomatic spreaders of multi\-drug resistant organisms in hospital settings\.NPJ Digital Medicine8\(1\),pp\. 147\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[9\]J\. Cui, J\. Heavey, L\. Lin, E\. Y\. Klein, G\. R\. Madden, C\. D\. Sifri, B\. Lewis, A\. K\. Vullikanti, and B\. A\. Prakash\(2024\)Modeling relaxed policies for discontinuation of methicillin\-resistant staphylococcus aureus contact precautions\.Infection Control & Hospital Epidemiology45\(7\),pp\. 833–838\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[10\]J\. Cui\(2026\)Bridging public health with clinical decisions from a data centric perspective\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.40,pp\. 39813–39814\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[11\]R\. Datta, J\. Cui, Z\. Guan, R\. Silwal, J\. C\. Eby, G\. Madden, and A\. Vullikanti\(2025\)Improving hospital risk prediction with knowledge\-augmented multimodal ehr modeling\.arXiv preprint arXiv:2508\.01970\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[12\]S\. Even\(2011\)Graph algorithms\.Cambridge University Press\.Cited by:[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1)\.
- \[13\]B\. Fatemi, J\. Halcrow, and B\. Perozzi\(2023\)Talk like a graph: encoding graphs for large language models\.arXiv preprint arXiv:2310\.04560\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[14\]J\. L\. Gross, J\. Yellen, and M\. Anderson\(2018\)Graph theory and its applications\.Chapman and Hall/CRC\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1)\.
- \[15\]J\. Guo, L\. Du, H\. Liu, M\. Zhou, X\. He, and S\. Han\(2023\)Gpt4graph: can large language models understand graph structured data? an empirical evaluation and benchmarking\.arXiv preprint arXiv:2305\.15066\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[16\]T\. Guo, X\. Chen, Y\. Wang, R\. Chang, S\. Pei, N\. V\. Chawla, O\. Wiest, and X\. Zhang\(2024\)Large language model based multi\-agents: a survey of progress and challenges\.arXiv preprint arXiv:2402\.01680\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p3.1)\.
- \[17\]Y\. Hu, R\. Lei, X\. Huang, Z\. Wei, and Y\. Liu\(2024\)Scalable and accurate graph reasoning with llm\-based multi\-agents\.arXiv preprint arXiv:2410\.05130\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p3.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[18\]C\. Huang, H\. Xu, Y\. Xu, P\. Dai, L\. Xia, M\. Lu, L\. Bo, H\. Xing, X\. Lai, and Y\. Ye\(2021\)Knowledge\-aware coupled graph neural network for social recommendation\.InProceedings of the AAAI conference on artificial intelligence,Vol\.35,pp\. 4115–4122\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[19\]Q\. Huang, H\. Ren, and J\. Leskovec\(2022\)Few\-shot relational reasoning via connection subgraph pretraining\.Advances in neural information processing systems35,pp\. 6397–6409\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p4.1)\.
- \[20\]B\. Jin, G\. Liu, C\. Han, M\. Jiang, H\. Ji, and J\. Han\(2024\)Large language models on graphs: a comprehensive survey\.IEEE Transactions on Knowledge and Data Engineering\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p3.1)\.
- \[21\]D\. Kreuzer, D\. Beaini, W\. Hamilton, V\. Létourneau, and P\. Tossou\(2021\)Rethinking graph transformers with spectral attention\.Advances in Neural Information Processing Systems34,pp\. 21618–21629\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[22\]C\. Liu and B\. Wu\(2023\)Evaluating large language models on graphs: performance insights and comparative analysis\.arXiv preprint arXiv:2308\.11224\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[23\]H\. Liu, S\. Xu, Z\. Zhao, L\. Kong, H\. Prabhakar Kamarthi, A\. Sasanur, M\. Sharma, J\. Cui, Q\. Wen, C\. Zhang,et al\.\(2024\)Time\-mmd: multi\-domain multimodal dataset for time series analysis\.Advances in Neural Information Processing Systems37,pp\. 77888–77933\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[24\]X\. Liu, S\. Zhao, K\. Su, Y\. Cen, J\. Qiu, M\. Zhang, W\. Wu, Y\. Dong, and J\. Tang\(2022\)Mask and reason: pre\-training knowledge graph transformers for complex logical queries\.InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining,pp\. 1120–1130\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[25\]A\. Madkour, W\. G\. Aref, F\. U\. Rehman, M\. A\. Rahman, and S\. Basalamah\(2017\)A survey of shortest\-path algorithms\.arXiv preprint arXiv:1705\.02044\.Cited by:[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1)\.
- \[26\]V\. Manjula\(2012\)Graph applications to data structures\.InAdvanced Materials Research,Vol\.433–440,pp\. 3297–3301\.External Links:[Document](https://dx.doi.org/10.4028/www.scientific.net/amr.433-440.3297),[Link](https://doi.org/10.4028/www.scientific.net/amr.433-440.3297)Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1)\.
- \[27\]M\. E\. Newman\(2006\)Modularity and community structure in networks\.Proceedings of the national academy of sciences103\(23\),pp\. 8577–8582\.Cited by:[§3\.1](https://arxiv.org/html/2605.06671#S3.SS1.p3.1)\.
- \[28\]G\. Shi, X\. Deng, L\. Luo, L\. Xia, L\. Bao, B\. Ye, F\. Du, S\. Pan, and Y\. Li\(2025\)Llm\-powered explanations: unraveling recommendations through subgraph reasoning\.Knowledge\-Based Systems,pp\. 114307\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p4.1)\.
- \[29\]R\. Socher, D\. Chen, C\. D\. Manning, and A\. Ng\(2013\)Reasoning with neural tensor networks for knowledge base completion\.Advances in neural information processing systems26\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[30\]A\. Tabassum, S\. Chinthavali, S\. Lee, N\. Stenvig, B\. Kay, T\. Kuruganti, and B\. A\. Prakash\(2021\)Efficient contingency analysis in power systems via network trigger nodes\.In2021 IEEE International Conference on Big Data \(Big Data\),pp\. 1964–1973\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[31\]A\. Tabassum, S\. Chinthavali, V\. Tansakul, and B\. A\. Prakash\(2021\)Actionable insights in multivariate time\-series for urban analytics\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[32\]R\. E\. Tarjan\(1983\)Data structures and network algorithms\.SIAM\.Cited by:[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1)\.
- \[33\]R\. Tarjan\(1972\)Depth\-first search and linear graph algorithms\.SIAM journal on computing1\(2\),pp\. 146–160\.Cited by:[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1)\.
- \[34\]J\. van Leeuwen\(1990\)Graph algorithms\.InAlgorithms and complexity,pp\. 525–631\.Cited by:[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1)\.
- \[35\]K\. Venkatesh, Y\. He, J\. Li, and J\. Cui\(2026\)PhysicsAgentABM: physics\-guided generative agent\-based modeling\.arXiv preprint arXiv:2602\.06030\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[36\]H\. Wang, S\. Feng, T\. He, Z\. Tan, X\. Han, and Y\. Tsvetkov\(2023\)Can language models solve graph problems in natural language?\.Advances in Neural Information Processing Systems36,pp\. 30840–30861\.Cited by:[§A](https://arxiv.org/html/2605.06671#S1a.p1.1),[§4\.1](https://arxiv.org/html/2605.06671#S4.SS1.p1.1),[§4\.1](https://arxiv.org/html/2605.06671#S4.SS1.p2.1),[§4\.1](https://arxiv.org/html/2605.06671#S4.SS1.p4.1),[§4\.2](https://arxiv.org/html/2605.06671#S4.SS2.p1.1),[Table 2](https://arxiv.org/html/2605.06671#S4.T2.1.1.1.1.2)\.
- \[37\]J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, F\. Xia, E\. Chi, Q\. V\. Le, D\. Zhou,et al\.\(2022\)Chain\-of\-thought prompting elicits reasoning in large language models\.Advances in neural information processing systems35,pp\. 24824–24837\.Cited by:[§4\.2](https://arxiv.org/html/2605.06671#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2605.06671#S4.T3.10.10.14.4.1),[Table 3](https://arxiv.org/html/2605.06671#S4.T3.10.10.16.6.1),[Table 3](https://arxiv.org/html/2605.06671#S4.T3.5.5.5.2),[Table 4](https://arxiv.org/html/2605.06671#S4.T4.13.13.17.2.2),[Table 4](https://arxiv.org/html/2605.06671#S4.T4.13.13.19.4.2),[Table 4](https://arxiv.org/html/2605.06671#S4.T4.13.13.21.6.2)\.
- \[38\]Y\. Wei, S\. Fu, W\. Jiang, Z\. Zhang, Z\. Zeng, Q\. Wu, J\. Kwok, and Y\. Zhang\(2024\)Gita: graph to visual and textual integration for vision\-language graph reasoning\.Advances in Neural Information Processing Systems37,pp\. 44–72\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p3.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[39\]Y\. Wei, Q\. Huang, J\. T\. Kwok, and Y\. Zhang\(2024\)Kicgpt: large language model with knowledge in context for knowledge graph completion\.arXiv preprint arXiv:2402\.02389\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[40\]Q\. Wu, G\. Bansal, J\. Zhang, Y\. Wu, B\. Li, E\. Zhu, L\. Jiang, X\. Zhang, S\. Zhang, J\. Liu,et al\.\(2024\)Autogen: enabling next\-gen llm applications via multi\-agent conversations\.InFirst Conference on Language Modeling,Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p3.1)\.
- \[41\]K\. Xu, W\. Hu, J\. Leskovec, and S\. Jegelka\(2018\)How powerful are graph neural networks?\.arXiv preprint arXiv:1810\.00826\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[42\]Y\. Yaakoubi, H\. Radi, and R\. Dimitrakopoulos\(2022\)Learning on graphs for mineral asset valuation under supply and demand uncertainty\.arXiv preprint arXiv:2212\.03865\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1)\.
- \[43\]J\. Zhang, H\. Zhang, C\. Xia, and L\. Sun\(2020\)Graph\-bert: only attention is needed for learning graph representations\.arXiv preprint arXiv:2001\.05140\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[44\]X\. Zhang, L\. Liang, L\. Liu, and M\. Tang\(2021\)Graph neural networks and their current applications in bioinformatics\.Frontiers in genetics12,pp\. 690049\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.06671#S2.SS1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
- \[45\]Z\. Zhang, X\. Wang, Z\. Zhang, H\. Li, Y\. Qin, and W\. Zhu\(2024\)LLM4DyG: can large language models solve spatial\-temporal problems on dynamic graphs?\.InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 4350–4361\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p2.1),[§2\.3](https://arxiv.org/html/2605.06671#S2.SS3.p1.1)\.
- \[46\]Z\. Zhang, J\. Wang, J\. Chen, S\. Ji, and F\. Wu\(2021\)Cone: cone embeddings for multi\-hop reasoning over knowledge graphs\.Advances in Neural Information Processing Systems34,pp\. 19172–19183\.Cited by:[§1](https://arxiv.org/html/2605.06671#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.06671#S2.SS2.p1.1)\.
## AAppendix
Besides the conventional graph reasoning tasks of connectivity, cycle detection, and shortest path introduced above, we additionally design a Triangle benchmark following Graphwiz\[[5](https://arxiv.org/html/2605.06671#bib.bib1)\]\. Like NLGraph\[[36](https://arxiv.org/html/2605.06671#bib.bib4)\], Graphwiz primarily evaluates relatively small graphs, which are comparatively easier for current LLM\-based methods to process and reason over\. To obtain a more balanced evaluation across graph sizes, we follow the same graph generation procedure as Graphwiz\[[5](https://arxiv.org/html/2605.06671#bib.bib1)\], but adjust the sampling distribution so that the dataset contains an equal number of graphs in each size range: 0\-20, 20\-40, 40\-60, 60\-80, and 80\-100 nodes\. For this supplementary evaluation, we report the performance ofGraphDCusing GPT\-4\.1\-mini without fine\-tuning under the Graphwiz protocol\. This construction makes the Triangle benchmark better suited for analyzing how graph reasoning performance scales with graph size\.
As shown in TableLABEL:tab:results\_triangle,GraphDCis competitive with Graphwiz on the smallest graphs and consistently performs better on medium and large graphs, with the gains becoming more pronounced as graph size increases\. This indicates that the proposed framework is effective on the Triangle task, especially on harder instances\.Similar Articles
GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning
GraphARC is a new benchmark for abstract reasoning on graph-structured data, extending the ARC paradigm to graphs. Evaluations of state-of-the-art language models reveal a comprehension-execution gap and performance degradation on larger instances, highlighting scaling challenges.
GraphReAct: Reasoning and Acting for Multi-step Graph Inference
This paper introduces GraphReAct, a framework that extends reasoning-acting paradigms to graph-structured data for multi-step inference. It combines topological and semantic retrieval with context refinement to improve performance on graph learning benchmarks.
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
This paper introduces Deep Reasoning, an inference-time approach that uses structured meta-reasoning to construct task-specific scaffolds for general-purpose agents. The proposed agent, Dolores, outperforms existing methods by distributing cognition across lower-load reasoning threads, reducing hallucinations and improving performance across multiple benchmarks.
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning
This technical report introduces DuMate-DeepResearch, a multi-agent framework for deep research tasks that decouples the agent core from a tool ecosystem, and incorporates graph-based dynamic planning, recursive two-level execution, and rubric-based test-time optimization. The system achieves state-of-the-art results on two deep research benchmarks, demonstrating the value of auditable agent infrastructure.
CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents
This paper introduces CoCoDA, a framework that uses a co-evolving compositional Directed Acyclic Graph (DAG) to manage tool libraries for augmented agents. It enables small language models to efficiently retrieve and compose tools, allowing an 8B model to match or exceed the performance of a 32B model on reasoning benchmarks.