@qingke_ai: https://x.com/qingke_ai/status/2072899949508063426

X AI KOLs Timeline 07/03/26, 04:27 AM Papers

icml-2026 multi-agent reasoning belief-state neuro-symbolic aiops medical-diagnosis

Summary

Nankai University and Lenovo collaborated to propose Graph of States (GoS), a neuro-symbolic framework for general abductive reasoning, which uses explicit belief states and state machines to control multi-agent collaboration, achieving significant improvements in medical diagnosis and system fault diagnosis tasks. This work was accepted at ICML 2026.

https://t.co/JOyCTFiQwV

Original Article

View Cached Full Text

Cached at: 07/03/26, 10:43 PM

ICML 2026 | Let Agents Truly Collaborate in Harmony: GoS Builds Shared Belief States for Multi-Agent Reasoning

Luo Yu, PhD student at Nankai University

Recently, large language models have continuously pushed the performance ceiling on tasks like mathematics and code generation. However, when it comes to real-world tasks such as medical diagnosis and fault troubleshooting, the real challenge lies in enabling multiple agents to engage in sustained collaborative reasoning within uncertain, dynamic environments.

Taking medical diagnosis as an example, a lead physician cannot simply order every possible test for a patient at the outset. Instead, based on the current diagnostic direction, they dynamically arrange for different departments—such as radiology and clinical laboratory—to perform tests step by step, continuously gathering evidence for treatment, and constantly revising their judgment throughout the process.

In contrast, existing multi-agent reasoning methods, while appearing to divide labor, often remain at the level of simple aggregation. They either place the output behavior of one agent before the next for continued processing, or they assume all evidence is already prepared in advance, lacking genuine autonomous investigation and dynamic decision-making capabilities.

The paper points out that this is precisely why existing reasoning frameworks like CoT, ToT, GoT, and FoT often exhibit four types of failure modes when transferred to scenarios such as medical diagnosis or system fault troubleshooting: evidence form, context as such, backtracking failure, and premature termination.

These failures are not coincidental but stem from two structural deficiencies:

Many existing methods mix hypotheses, evidence, and reasoning progress into unstructured natural language contexts, lacking explicit state representation;
The absence of state machine control means that whether an agent chooses to backtrack, drill down, or terminate is largely left to free-form improvisation.

Consequently, during long-horizon reasoning processes, agents often struggle to stably maintain their reasoning state, easily diverge from the correct direction, or prematurely settle on superficial conclusions.

A research team from Nankai University, in collaboration with Lenovo, proposed Graph of States (GoS), a neuro-symbolic framework for general abductive reasoning tasks. Its core objective is to explicitly construct a maintainable, revertible, and restructurable reasoning state space for abductive tasks, transforming the originally implicit and scattered reasoning process into a constrained directed search. This work has been officially accepted by ICML 2026.

Currently, xCloud (Lenovo Intelligent Cloud) is accelerating the integration of GoS technology into its intelligent operations product suite, helping enterprises build an intelligent operations system that is zero-fault, self-healing, and business-resilient.

01 GoS: Adding “Explicit Belief States” to Reasoning

The core idea of GoS is to combine multi-agent collaboration with explicit belief state modeling. The entire system is divided into two layers: the upper cognitive layer, responsible for multi-agent collaboration within specific domains; and the lower symbolic layer, responsible for maintaining the formalized reasoning state, navigating and constraining the process.

In the cognitive layer, GoS no longer uses scattered functional atoms. Instead, it lets central agents and expert agents role-play real-world professional roles. For instance, in a medical scenario, roles could include attending physician, radiologist, and pathologist; in a general system scenario, roles could include application operations, Linux operations, network operations, and database operations. The goal is to make the reasoning process closer to real-world collaborative workflows, making it easier for humans to understand and audit the reasoning process.

The most critical part of GoS is the symbolic layer. It no longer buries the investigation process in unstructured historical dialogues. Instead, it explicitly represents the belief state composed of a causal graph and a state machine. The causal graph records relationships among symptoms, evidence, hypotheses, supports, refutations, and refinements. The state machine controls the current reasoning state, determining whether to continue gathering evidence, drill down to finer granularity, or backtrack to an earlier layer when conflicting evidence arises.

Furthermore, GoS introduces a key mechanism: reasoning focus. At each step, the system does not generally perceive all possible directions. Instead, it focuses on the hypothesis with the highest current confidence, directing investigation and reasoning resources towards the most promising leads. This helps transform easily divergent exploration into a “navigation-guided investigation.”

02 Double-Layer Closed Loop: From Reasoning Focus to Evidence Update

GoS’s reasoning process is not a simple “plan first, then execute” cycle, but a continuous closed loop. First, the symbolic layer identifies the reasoning focus from the current belief state and translates it into investigation instructions for the cognitive layer. Then, based on these instructions, the cognitive layer calls tools, gathers evidence, and completes analysis, returning the results to the symbolic layer. The results are used to update the causal graph, recalculate hypothesis confidence, and trigger the next state transition.

This closed loop ensures that multi-agent collaboration is no longer unconstrained free-form improvisation, but always progresses around the currently most valuable hypothesis. Newly acquired evidence no longer remains merely in text but becomes the basis for subsequent reasoning.

03 Key Mechanisms: Backtrack When Needed, Drill Down When Needed

For abductive reasoning tasks, the real difficulty is often not “generating an answer,” but making rule-based state transitions during the reasoning process based on evidence changes. To this end, GoS designs two core state transition mechanisms: Backtracking and Drill-Down.

Rather than leaving these decisions entirely to the free will of the LLM, GoS introduces explicit transition rules for states.

Specifically, when the confidence of a higher-level ancestor hypothesis on the current reasoning path is no longer the best candidate at that level after re-evaluation, the system backtracks to that corresponding node and prunes subsequent branches built on the false premise. Drill-down, on the other hand, does not happen just because “it feels about right to continue deeper.” Instead, the system only refines to more specific sub-hypotheses when the current hypothesis simultaneously satisfies sufficient confidence advantage and sufficient supporting evidence quantity.

It is this explicitly constrained control that allows GoS, when faced with non-monotonic, dynamically emerging information, to no longer just generate coherent text but to incrementally approach the true underlying cause in a more stable and controllable manner.

04 Experiments: Validating GoS in Two High-Stakes Real-World Scenarios

To verify the effectiveness and generality of GoS, the paper evaluates it on two highly realistic abductive scenarios: medical diagnosis and topology-based system fault diagnosis.

In the medical diagnosis task, the authors made a key modification based on the DiagnosisArena benchmark: instead of providing complete auxiliary examination results at the start, only the patient’s chief complaint and basic physical examination findings are given. Agents must proactively request tests, gradually gather external information, and then complete the diagnosis, thus restoring the abductive essence of “active evidence collection, dynamic reasoning.”

On this task, GoS achieved a 39.86% match rate and 78.99% relevance under the Human-as-a-Judge evaluation, significantly outperforming all baseline methods and achieving better results under stricter criteria.

In the multi-system fault diagnosis task, 150 incidents were constructed based on real production environments. Agents were required to start from alarms, proactively query logs, metrics, and shell commands, gradually reconstruct the fault context, and identify the root cause.

Experimental results showed that GoS achieved a 70.67% match rate and 88.00% relevance, with the match rate being 36.67 percentage points higher than the strongest baseline.

This indicates that while many methods can determine “roughly where the problem lies” (hence their relevance is not low), converging to the truly active fine-grained root cause still requires sustained investigation, controlled states, and systematic drilling down—and this is precisely where GoS excels.

The authors further conducted comprehensive ablation studies and parameter sensitivity analyses. The results demonstrate that GoS’s performance gains do not stem from some accidental trick but indeed rely on the synergistic effects of key modules like reasoning focus, causal graph, and state machine.

Moreover, as the number of neuro-symbolic exchange rounds, search depth, and state transition thresholds vary, GoS exhibits clear and interpretable performance trends, indicating that the framework is not only effective but also possesses good stability and controllability.

05 Significance: From Domain-Specific Methods to a General Reasoning Framework

From a broader perspective, the significance of GoS extends beyond simply improving performance on medical and AIOps tasks. It advances a more fundamental question: For high-stakes tasks in the real world, agents need more than just more knowledge, more tools, or more capabilities. They also need to explicitly maintain belief states under incomplete information, handle conflicting evidence, backtrack when necessary, drill down appropriately, and ultimately guide the search process stably towards the true root cause.

From this perspective, GoS addresses a very critical class of problems in current agent research: long-horizon reasoning and multi-turn interaction. The agent does not just answer once but must maintain state consistency throughout continuous investigation and multi-turn interactions, gradually converging.

The paper also notes that GoS does not replace existing domain-specific methods but rather complements them. Whether it’s high-quality knowledge bases and RAG in medicine, or multi-modal architectures and SOP retrieval in AIOps, these can all be combined with GoS to enhance search efficiency and decision reliability in vertical domains.

Therefore, GoS does not provide a specific agent but offers a general reasoning scaffolding designed for abductive reasoning, and indeed for long-horizon reasoning by agents.

Author Bio

The first author of this paper is Luo Yu, a first-year PhD student in the Intelligent Operations Research Group at Nankai University. His main research directions include long-horizon reasoning for agents, self-evolution of agents, and root cause analysis. The corresponding author is Sun Yongqian, Associate Professor and PhD supervisor at the School of Software, Nankai University.

He has long been engaged in the field of AIOps, focusing on fault research in cloud reconstruction, data centers, high-performance computing, and intelligent computing, while also working on cutting-edge directions such as multi-agent collaboration and LLM reasoning optimization, continuously advancing intelligent decision-making for complex systems.

@qingke_ai: https://x.com/qingke_ai/status/2072899949508063426

ICML 2026 | Let Agents Truly Collaborate in Harmony: GoS Builds Shared Belief States for Multi-Agent Reasoning

01 GoS: Adding “Explicit Belief States” to Reasoning

02 Double-Layer Closed Loop: From Reasoning Focus to Evidence Update

03 Key Mechanisms: Backtrack When Needed, Drill Down When Needed

04 Experiments: Validating GoS in Two High-Stakes Real-World Scenarios

05 Significance: From Domain-Specific Methods to a General Reasoning Framework

Author Bio

Similar Articles

@seclink: https://x.com/seclink/status/2067970118873993482

@snowboat84: https://x.com/snowboat84/status/2065215177029787705

At this point own MRI makes sense. - Google Is Mapping the Human Brain

@GoSailGlobal: https://x.com/GoSailGlobal/status/2058405413737857497

Submit Feedback

Similar Articles

@elliotchen100: Translate the work on MiroMind under Shanda. The next step of post-training might be scientific discovery itself. Simply put, it trains a model to propose research hypotheses across different disciplines. Physics, chemistry, and biology all use one method. The paper was accepted at ICML 2026, code open source...

@seclink: https://x.com/seclink/status/2067970118873993482

@snowboat84: https://x.com/snowboat84/status/2065215177029787705

At this point own MRI makes sense. - Google Is Mapping the Human Brain

@GoSailGlobal: https://x.com/GoSailGlobal/status/2058405413737857497