Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue

arXiv cs.CL 04/20/26, 04:00 AM Papers

Summary

Context-Agent proposes a novel framework that models multi-turn dialogue history as dynamic tree structures rather than flat sequences, better capturing the hierarchical and branching nature of natural conversation. The paper introduces the NTM benchmark for evaluating non-linear dialogue scenarios and demonstrates improved task completion rates and token efficiency across various LLMs.

arXiv:2604.05552v2 Announce Type: replace Abstract: Large Language Models demonstrate outstanding performance in many language tasks but still face fundamental challenges in managing the non-linear flow of human conversation. The prevalent approach of treating dialogue history as a flat, linear sequence is misaligned with the intrinsically hierarchical and branching structure of natural discourse, leading to inefficient context utilization and a loss of coherence during extended interactions involving topic shifts or instruction refinements. To address this limitation, we introduce Context-Agent, a novel framework that models multi-turn dialogue history as a dynamic tree structure. This approach mirrors the inherent non-linearity of conversation, enabling the model to maintain and navigate multiple dialogue branches corresponding to different topics. Furthermore, to facilitate robust evaluation, we introduce the Non-linear Task Multi-turn Dialogue (NTM) benchmark, specifically designed to assess model performance in long-horizon, non-linear scenarios. Our experiments demonstrate that Context-Agent enhances task completion rates and improves token efficiency across various LLMs, underscoring the value of structured context management for complex, dynamic dialogues. The dataset and code is available at GitHub.

Original Article

View Cached Full Text

Cached at: 04/20/26, 08:32 AM

# Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue
Source: https://arxiv.org/html/2604.05552
Junan Hu, Shudan Guo, Wenqi Liu, Jianhua Yin, Yinwei Wei Shandong University, China junanhu@mail\.sdu\.edu\.cn, weiyinwei@hotmail\.com

###### Abstract

Large Language Models demonstrate outstanding performance in many language tasks but still face fundamental challenges in managing the non-linear flow of human conversation. The prevalent approach of treating dialogue history as a flat, linear sequence is misaligned with the intrinsically hierarchical and branching structure of natural discourse, leading to inefficient context utilization and a loss of coherence during extended interactions involving topic shifts or instruction refinements. To address this limitation, we introduce Context-Agent, a novel framework that models multi-turn dialogue history as a dynamic tree structure. This approach mirrors the inherent non-linearity of conversation, enabling the model to maintain and navigate multiple dialogue branches corresponding to different topics. Furthermore, to facilitate robust evaluation, we introduce the Non-linear Task Multi-turn Dialogue (NTM) benchmark, specifically designed to assess model performance in long-horizon, non-linear scenarios. Our experiments demonstrate that Context-Agent enhances task completion rates and improves token efficiency across various LLMs, underscoring the value of structured context management for complex, dynamic dialogues. The dataset and code are available on GitHub (https://github.com/Steve2457/Context-Agent).

Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue

Junan Hu, Shudan Guo, Wenqi Liu, Jianhua Yin, Yinwei Wei††thanks: Corresponding author. Shandong University, China junanhu@mail\.sdu\.edu\.cn, weiyinwei@hotmail\.com

## 1 Introduction

The advancement of dialogue systems based on LLMs is pivotal for the efficacy of next-generation applications, including AI Agents and collaborative robotics, where the ability to maintain context-aware communication is fundamental to task completion and user engagement (Durante et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib2); Yao et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib1); Sun et al., 2026 (https://arxiv.org/html/2604.05552#bib.bib41)). Following the advent of LLMs' context window expansion techniques, the capabilities for multi-turn dialogue have been significantly enhanced (Li et al., 2025 (https://arxiv.org/html/2604.05552#bib.bib6)).

Refer to captionFigure 1: A schematic diagram of linear (upper) vs. non-linear (lower) dialogue flow.

However, LLMs still grapple with a fundamental challenge inherent to natural human conversation: the management of non-linear dialogue flow. This phenomenon occurs when conversational topics do not advance in a sequential order but instead feature shifts, topical jumps, or interwoven threads of discussion (Laban et al., 2025 (https://arxiv.org/html/2604.05552#bib.bib19)). Such non-linear dynamics are commonplace in real-world interactions, where users may revisit previous topics, introduce new subjects, or refine earlier statements based on evolving understanding or context (Mann and Thompson, 1988 (https://arxiv.org/html/2604.05552#bib.bib8)). The prevalent approach of treating dialogue history as a flat, linear sequence is fundamentally misaligned with the intrinsic structure of human conversation (Wang et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib3); Li et al., 2025 (https://arxiv.org/html/2604.05552#bib.bib6)). This linear paradigm fails to capture the hierarchical and branching nature of dialogues, leading to inefficiencies in context utilization and challenges in maintaining coherence over extended interactions (Lian et al., 2026 (https://arxiv.org/html/2604.05552#bib.bib46); Ding et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib4)).

Effectively resolving the non-linear flow problem requires overcoming several challenges. The first is the accurate identification and management of topic shifts and instruction refinements within a conversation. The second is the efficient selection of context from a potentially vast and complex dialogue history. As conversations extend over multiple turns, the accumulation of information can lead to increased computational costs and the risk of overwhelming the model with irrelevant details (Joren et al., 2025 (https://arxiv.org/html/2604.05552#bib.bib23); Jiang et al., 2026 (https://arxiv.org/html/2604.05552#bib.bib47)), leading to the "needle in a haystack" problem (Liu et al., 2024b (https://arxiv.org/html/2604.05552#bib.bib20); Vaswani et al., 2017 (https://arxiv.org/html/2604.05552#bib.bib22)). The third challenge lies in the development of robust evaluation metrics and benchmarks that can accurately assess a model's performance in handling non-linear dialogues, as existing datasets often lack the complexity and variability found in real-world interactions.

To address these challenges, inspired by the hierarchical organization inherent in human cognitive processes for managing complex dialogues (Grosz and Sidner, 1986 (https://arxiv.org/html/2604.05552#bib.bib9)), we propose Context-Agent, a novel framework that models multi-turn dialogue history as a dynamic tree. This approach allows for the representation of conversations in a way that reflects their inherent non-linear nature, enabling the model to maintain multiple branches of dialogue corresponding to different topics. Furthermore, recognizing the inadequacy of existing datasets for this problem, we introduce the Non-linear Task Multi-turn Dialogue (NTM) benchmark, specifically designed to evaluate the performance of models in long-horizon, non-linear dialogue scenarios. This benchmark features dialogues with multiple topic shifts and instruction refinements, providing a more realistic and challenging testbed for assessing context management strategies.

In summary, the main contributions of this paper are as follows:

- We propose Context-Agent, a novel framework that models dialogue history as a dynamic tree. This approach captures non-linear discourse structure, enabling precise context navigation via tree structure.
- We introduce the Non-linear Task Multi-turn Dialogue (NTM) benchmark. It features long-horizon dialogues with complex topic shifts and instruction refinements, offering a rigorous testbed for non-linear context management.
- Experiments across various LLMs demonstrate that Context-Agent significantly outperforms linear baselines, improving task completion rates while reducing token usage.

## 2 Related Works

**Linear Context Extension and Compression.** While recent works have explored structured and task-aware parameter-efficient fine-tuning (Xiao et al., 2026 (https://arxiv.org/html/2604.05552#bib.bib44)), architectures for context extension like YaRN (Peng et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib26)) and LongLoRA (Chen et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib28)) extend context windows but face high computational costs and the "lost-in-the-middle" problem (Liu et al., 2024b (https://arxiv.org/html/2604.05552#bib.bib20)). Conversely, compression methods (Su and Zhou, 2022 (https://arxiv.org/html/2604.05552#bib.bib29); Park et al., 2021 (https://arxiv.org/html/2604.05552#bib.bib31)) reduce token usage but degrade performance by flattening dialogue structure, sacrificing details essential for complex reasoning.

**Structured Memory and Retrieval.** Retrieval-Augmented Generation (RAG) adapts external retrieval to internal dialogue history, with various methods addressing data quality and mitigating retrieval-induced hallucinations (Zhang et al., 2026a (https://arxiv.org/html/2604.05552#bib.bib42); Ma et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib43)). While flat retrieval methods like DH-RAG (Zhang et al., 2025 (https://arxiv.org/html/2604.05552#bib.bib33)) filter irrelevant turns, they often retrieve fragmented segments that lack local coherence. Recent advances have moved towards structured memory. Notably, MemTree (Rezazadeh et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib38)) and RAPTOR (Sarthi et al., 2024 (https://arxiv.org/html/2604.05552#bib.bib39)) organize information into hierarchical tree structures.

| Method | Structure | Construction Basis | Retrieval Unit | Local Coherence | Update Efficiency |
|--------|-----------|-------------------|-----------------|-----------------|-------------------|
| Linear & Compression Methods | Full Context | Linear Sequence | Token Concatenation | Entire History | High | Very Low (O(N²)) |
| MemGPT | OS-like Hierarchy | Event-Triggered/Function | Paginated Memory | High (Self-Edit) | Medium |
| Retrieval-Augmented Generation (RAG) | Standard RAG | Flat Index | Semantic Similarity | Indep. Chunks | Low (Disjointed) | High |
| DH-RAG | Chain | Semantic Clustering | Query Chains | High (Dynamic) | Medium (Incremental) |
| Tree-Structured Memory | RAPTOR | Static Tree | Bottom-up Clustering | Abstractive Summaries | High | Low (Offline Rebuild) |
| MemTree | Dynamic Tree | Online Clustering | Collapsed Nodes | Medium (Disjointed) | High (O(log N)) |
| Context-Agent (Ours) | Dynamic Tree | Discourse Intent | Coherent Path | Very High (Path-Aware) | High (Event-Triggered) |

Table 1: Comparison of context management paradigms. We compare our method with linear methods, standard RAG, advanced RAG, and tree-based memory.

Table 1 delineates the distinctions between our framework and existing paradigms. A fundamental limitation of current structured approaches, such as MemTree, lies in their reliance on *semantic similarity* for aggregation, grouping content based on textual overlap rather than *discourse flow*. This often conflates distinct conversational threads that share lexical features but diverge in intent. Conversely, *Context-Agent* explicitly models *discourse structure* (Grosz and Sidner, 1986 (https://arxiv.org/html/2604.05552#bib.bib9)). By constructing trees based on *navigational intent* (e.g., instruction refinement, topic switching) and retrieving coherent *paths* instead of isolated nodes, our approach preserves the logical continuity requisite for complex, long-horizon tasks.

## 3 Method

Our framework models a multi-turn dialogue as a forest of topic trees. Each tree represents a distinct topic and is composed of nodes (dialogue units) and branches. The dialogue's evolution is managed through state transitions.

### 3.1 Formal Problem Definition

Conventional dialogue systems model history as a linear sequence H_t = {(q₁, r₁), ..., (q_t, r_t)}, generating a response r_{t+1} from a query q_{t+1} via a function g(H_t, q_{t+1}). This flat representation leads to contextual redundancy and loss of structural information.

To address this limitation, we introduce and formalize the problem of Non-linear Contextual Dialogue Management. The central premise of this problem is to shift from treating the entire history H_t as an undifferentiated input to representing it as a dynamically evolving, hierarchically structured dialogue forest, denoted as F_t.

We model the interaction flow as a dynamic tree to align with the Attentional State theory (Grosz and Sidner, 1986 (https://arxiv.org/html/2604.05552#bib.bib9)). This theory posits that human cognitive focus operates hierarchically, managing a focus stack rather than a connected graph. Explicit graph structures risk violating local coherence by merging distinct branches, thereby introducing noise from competing contexts. In contrast, our tree framework enforces logical isolation between diverging paths (e.g., separate travel plans). This design mirrors human cognitive separation, ensuring the model maintains a clear, distraction-free train of thought.

At each turn t+1, given:

- A structured dialogue history represented as a forest, H_t = F_t.
- The current state S_t = (H_t, T_act, B_act, n_cur), which includes the history, the active topic tree, the active branch, and the current node.
- The new user query q_{t+1}.

The objective is to learn a policy π that comprises two key functions: a context selection function, f_select, and a response generation function, f_gen:

C_{t+1} = f_select(q_{t+1}, S_t)
r_{t+1} = f_gen(q_{t+1}, C_{t+1})

Here, C_{t+1} represents a highly relevant context subset, which is dynamically selected and constructed from the structured history H_t. The ultimate goal is to maximize the task completion rate while minimizing the token footprint of the selected context C_{t+1}, thereby achieving efficient context utilization without compromising conversational coherence or task-oriented performance.

Refer to captionFigure 2: An overview of the Context-Agent framework. It illustrates the dynamic evolution of a multi-turn dialogue represented as a forest of topic trees, with branches indicating sub-dialogue paths. The number in each node represents the turn number in the conversation. Solid edges represent the active path, while dashed edges indicate inactive paths.

### 3.2 Core Components

**Node**

The smallest unit of a conversation is a node n, which represents the content of a round of dialogue between the user and the model. Each node is defined as a tuple:

n = (c, v, p, β, s_i)

where c is the content of the current conversation round, v ∈ ℝ^d is its d-dimensional text embedding, p is the parent node's identifier (null for a root), β is the branch identifier, and s_i is a summary of the node's content. After each round, a summarization function S_node converts the content c_i into a summary s_i = S_node(c_i), which is used for subsequent topic attribution and branch management.

**Topic Tree**

An independent topic is represented by a topic tree T. It is a directed acyclic graph, T = (N, E). Here, N = {n₁, n₂, ..., n_k} is the set of all nodes under this topic, and E = {(n_i, n_j) | p(n_j) = n_i} is the set of directed edges between nodes, representing the inheritance relationship of the conversation. The first dialogue round of a new topic is set as the root node, whose parent node is null, of the topic tree.

**Branch**

Within the same topic tree T, a branch B is a relatively independent dialogue path that starts from a branching point but still remains under the same topic. It is defined as an ordered sequence of nodes B = ⟨n₁, n₂, ..., n_k⟩, where any two adjacent nodes (n_i, n_{i+1}) in the sequence satisfy p(n_{i+1}) = n_i. All nodes within the same branch share the same branch identifier β.

**Conversation History**

The complete history H of a multi-turn conversation is represented as a forest F consisting of multiple topic trees, i.e., H = F = {T₁, T₂, ..., T_m}.

### 3.3 State Transition

The conversational state at turn t is defined as S_t = (H_t, T_act, B_act, n_cur), which includes the history, the active topic tree, the active branch, and the current node. The conversation evolves through state transitions driven by new user queries. Upon receiving a new query, the system analyzes it to determine the topic and manage branches, updating the state accordingly. This process involves the following steps:

- **Step 0: Initialization** Initialize the first topic tree T₁ as the active tree T_act. Define an aggregation function S to summarize

Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue

Similar Articles

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

@neural_avb: https://x.com/neural_avb/status/2063907440509571354

Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

Submit Feedback

Similar Articles

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

@neural_avb: https://x.com/neural_avb/status/2063907440509571354

Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks