Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
Summary
This paper introduces Deep Reasoning, an inference-time approach that uses structured meta-reasoning to construct task-specific scaffolds for general-purpose agents. The proposed agent, Dolores, outperforms existing methods by distributing cognition across lower-load reasoning threads, reducing hallucinations and improving performance across multiple benchmarks.
View Cached Full Text
Cached at: 05/13/26, 06:11 AM
# Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
Source: [https://arxiv.org/html/2605.11388](https://arxiv.org/html/2605.11388)
\\minted@def@optcl
envname\-P envname\#1
Dean Light1Michael Theologitis111footnotemark:1Kshitish Ghate111footnotemark:1 Shuyue Stella Li1Benjamin Newman1Chirag Shah1Aylin Caliskan1 Pang Wei Koh1Dan Suciu1Yulia Tsvetkov1 1University of Washington Seattle, WA, USA \{deanlcs, mthe, kghate\}@cs\.washington\.edu
[dolores](https://github.com/DeanLight/dolores)
###### Abstract
Humans solve complex problems by flexibly shifting among reasoning modes, often without explicit deliberation: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well\-specified subproblems\. Current LLM agents lack this flexibility, as their scaffolds hard\-code such reasoning decisions in advance through fixed inference patterns\. These scaffolds are effective when their prescribed structure matches the task, but brittle when solving the task requires adapting the structure of reasoning itself\. We introduceDeep Reasoning – an inference\-time approach for constructing task\-specific scaffolds through structured meta\-reasoning\. Deep Reasoning uses a formal language that represents meta\-reasoning as executable decompositions over associative inference, formal computation, and recursive subproblem solving, enabling decomposition principles to be encoded as in\-context examples that guide test\-time scaffold construction\. We instantiate this approach in a general\-purpose agent \(Dolores\) that distributes complex tasks across smaller, more controlled reasoning threads while preserving dependencies among subproblems\. We evaluateDoloresagainst state\-of\-the\-art scaffolding methods across four hard benchmarks: grounded multi\-hop reasoning, synthetic long\-chain question answering, long\-context aggregation, and deep research\-style information seeking\.Doloresoutperforms all evaluated scaffolds across four benchmarks, three model sizes, and two model families, improving over the strongest evaluated scaffold baseline by 24\.8% on average, including methods tailored to individual benchmark families\. Trace and token analyses suggest that while baseline scaffolds fail by overloading individual LLM calls,Doloressucceeds by distributing cognition across structured, lower\-load reasoning threads, thereby reducing premature termination and hallucination\. This advantage can even bridge the scaling gap, with an 8B version surpassing all evaluated 32B baselines from the same family in more than half the settings\. These results point toward future agentic systems that treat scaffolding as adaptive reasoning, constructing the structure each task requires just\-in\-time\.
## 1Introduction
Humans intuitively solve complex problems through meta\-reasoning, a cognitive process in which we model the task and environment, select appropriate problem\-solving strategies, break problems into manageable steps, plan how to solve them, and execute these plans while tracking intermediate reasoning and goals\(Newellet al\.,[1959](https://arxiv.org/html/2605.11388#bib.bib87); Flavell,[1979](https://arxiv.org/html/2605.11388#bib.bib39)\)\. Crucially, meta\-reasoning allows us to integrate multiple modes of cognition, including formal reasoning, based on explicit rules and logical operations, and associative reasoning, which leverages intuition and patterns from prior experience\(Bellini\-Leite,[2022](https://arxiv.org/html/2605.11388#bib.bib42)\)\. The interplay between these two is dynamic and task\-dependent, enabling us to be effective problem\-solvers across a wide range of domains, including but not limited to mathematics, information synthesis, programming and creative writing\(Stanovich,[2011](https://arxiv.org/html/2605.11388#bib.bib41)\)\.
Figure 1:Deep Reasoning leverages human meta\-reasoning traces to build “just\-in\-time” scaffolds\. \(a\): A task is given to a human\. \(b\): The human intuitively meta\-reasons about how to solve it\. \(c\): These verbalized traces are then directly mapped into the language of Deep Reasoning\. \(d\): A Deep Reasoning agent likeDolores\(§[4](https://arxiv.org/html/2605.11388#S4)\) uses them as in\-context examples to guide its own meta\-reasoning abilities\. Exampleadaptedfrom the DeepSearchQA benchmark\(Guptaet al\.,[2026](https://arxiv.org/html/2605.11388#bib.bib57)\)\.Attempting to mimic human reasoning, LLMs achieve impressive results across a variety of both creative \(e\.g\., writing\) and formal \(e\.g\., programming and mathematics\) domains\. However, there is growing evidence that this is “shallow reasoning”, a surface\-level imitation of verbalized human reasoning that does not reliably track underlying thought processes\(Karguptaet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib92)\)\. In particular, LLMs struggle to reason formally in a robust way, which is in line with work showing LLMs to be unfaithful to their own reasoning traces\(Lanhamet al\.,[2023](https://arxiv.org/html/2605.11388#bib.bib30); Yeeet al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib31); Lyuet al\.,[2023](https://arxiv.org/html/2605.11388#bib.bib32); Arcuschinet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib33)\)and to degrade in performance on long or cognitively demanding reasoning tasks\(Jaechet al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib44); Chenet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib34); Hassidet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib35)\)\.
To address these limitations, progress in LLM\-based systems has increasingly relied on carefully designed agentic scaffolds\(Lanhamet al\.,[2023](https://arxiv.org/html/2605.11388#bib.bib30); Yeeet al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib31); Chenet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib34)\)\. These scaffolds are programs that coordinate communication among LLMs, and between LLMs and external tools\(Rosser and Foerster,[2026](https://arxiv.org/html/2605.11388#bib.bib21)\)\. However, they typicallypredefinehow a task should be decomposed, and how the reasoning should be carried out\. For example, ReAct\(Yaoet al\.,[2022](https://arxiv.org/html/2605.11388#bib.bib46)\)breaks down the task in a sequence of associative reasoning steps with tool calls\. CodeAct\(Wanget al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib47)\)replaces these tool calls with formal reasoning via code\. Deep Research\(Roucheret al\.,[2025b](https://arxiv.org/html/2605.11388#bib.bib54)\)prescribes a two\-layer decomposition, where a manager agent delegates subproblems to specialized search agents\. Finally, RLMs\(Zhanget al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib49)\)decompose long\-input tasks by recursively chunking them inside a programmatic environment via recursive self\-calls\. The common pattern across these scaffolds is that they hard\-code how to decompose reasoning\. Essentially, they fix how to meta\-reason about a task without actually seeing it, and, as a result, fail when other reasoning behaviors are required\(Fuet al\.,[2025a](https://arxiv.org/html/2605.11388#bib.bib14)\)\.
In this work, we leverage insights from human meta\-reasoning to build “just\-in\-time” meta\-reasoning scaffolds\. While humans are able to meta\-reason intuitively, there is no formal way capture these processes and translate them for agentic systems\. To this end, we ground our approach in the cognitive science literature \(§[2](https://arxiv.org/html/2605.11388#S2)\) and introduce the language of Deep Reasoning \(§[3](https://arxiv.org/html/2605.11388#S3)\), a formal language that combines associative and formal reasoning through structured meta\-reasoning\. Deep Reasoning provides a principled way to take human meta\-reasoning traces, formalize them, and convert them into atomic decompositions usable by agentic systems in\-context\. It allows us to instill how humans decompose and reason about tasks into agents just\-in\-time \(Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\)\.
We instantiate this approach in an agent we callDolores\(Deep Layered Reasoning Scaffold, §[4](https://arxiv.org/html/2605.11388#S4)\), which uses in\-context meta\-reasoning examples to guide its reasoning and dynamically adapt its scaffold at test time\. It outperforms all evaluated state\-of\-the\-art scaffold methods, with anaverage improvement of 24\.8% over the best\-performing baseline across four reasoning tasks, three model sizes, and two model families\. Notably, we find thatDoloresequipped with an 8B model outperforms all baselines that use a model from the next scaling tier \(32B\) of same family in more than half of the evaluated settings\. By analyzing reasoning traces, we also show that all other scaffolds tend to hallucinate and terminate prematurely as they delegate too much to a single LLM context thread\. In contrast,Doloresavoids these issues by structuring reasoning in more fine\-grained ways, decomposing it intoatomicassociative, formal, or meta\-reasoning steps that can be handled reliably by a single LLM memory thread\.
Our contributions are twofold:1the language of Deep Reasoning, a formal language for capturing human meta\-reasoning traces in a way that can be used by agentic systems, and2Dolores, an agent that operationalizes our approach and demonstrates improved performance against state\-of\-the\-art methods\. Our code is available at[https://github\.com/DeanLight/dolores](https://github.com/DeanLight/dolores)\.
## 2Background
We build the Deep Reasoning language, a formal language for capturing human meta\-reasoning, on concepts from prior work in cognitive science, which are discussed below\.
### 2\.1Axes of Reasoning
Prior work characterizes reasoning along two orthogonal dimensions:howreasoning is carried out and what the reasoning isabout\. To make these distinctions concrete, we refer throughout to the running example introduced in Figure[1](https://arxiv.org/html/2605.11388#S1.F1)about the volleyball court\.
Figure 2:Informal and formal reasoning describe*how*reasoning is carried out, while meta and object levels describe what the reasoning isabout\.#### Associative vs\. Formal \(how\)\.
When solving a task, some steps rely on intuition and associations, while others follow explicit rules\. Associative reasoning generally operates through intuitive proximity shaped by memory and context\(Mednick,[1962](https://arxiv.org/html/2605.11388#bib.bib93); Sloman,[1996](https://arxiv.org/html/2605.11388#bib.bib94)\)\. For example, interpreting “The City by the Bay” as San Francisco can be done associatively drawing on experience and abstract links between concepts\. In contrast, formal reasoning proceeds via the sequential application of logical rules over structured, well\-defined representations\(Wason,[1968](https://arxiv.org/html/2605.11388#bib.bib97); Rips,[1983](https://arxiv.org/html/2605.11388#bib.bib80); Johnson\-Laird and Byrne,[1991](https://arxiv.org/html/2605.11388#bib.bib96)\)\. For example, counting the number of volleyball games that went to a tie\-break and selecting the maximum follows a clear, rule\-based procedure, and is therefore formal reasoning\.
#### Object vs\. Meta\-level \(about what\)\.
At the same time, some reasoning steps operate on the problem itself, while others organize how to solve it\. Object\-level reasoning operates on objects in the task\. For example, interpreting the “City by the Bay” and counting the number of tie\-breaks are object\-level reasoning, since they act directly on the task\. In contrast, meta\-level reasoning operates over the reasoning process: selecting strategies, decomposing problems, and guiding inference by constructing and updating an internal model of the task, state, and goals\(Newellet al\.,[1959](https://arxiv.org/html/2605.11388#bib.bib87); Flavell,[1979](https://arxiv.org/html/2605.11388#bib.bib39); Ackerman and Thompson,[2017](https://arxiv.org/html/2605.11388#bib.bib40)\)\. Deciding to break task into steps \(e\.g\., “first identify the city, then list volleyball courts, then count…”\), without executing the actual steps, is meta\-level reasoning\.
These two dimensions are orthogonal \(Figure[2](https://arxiv.org/html/2605.11388#S2.F2)\)\. For example, we can have associative meta\-reasoning or formal object\-level reasoning\.
### 2\.2Human Reasoning
Humans tightly interleave meta\- and object\-level reasoning when solving tasks that go beyond their cognitive capacity \(total processing resources available in working memory\)\. When working through a problem, we search for structure, decompose it into manageable subproblems, solve them, and integrate the results according to that structure\(Griffithset al\.,[2019](https://arxiv.org/html/2605.11388#bib.bib10); Swelleret al\.,[2019](https://arxiv.org/html/2605.11388#bib.bib9)\)\. This search for structure is a form of meta\-reasoning, often referred to as “modeling” or “mental modeling”\. Our strong reasoning and generalization abilities are often attributed to these meta\-reasoning and modeling capabilities\(Karguptaet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib92)\)\. These flexible reasoning abilities are also deeply rooted in our use of natural language\. Language plays a dual role in human reasoning: we sometimes use it formally \(e\.g\., “How many apples are in the basket?”\), while other times we use it associatively \(e\.g\., “Is this apple ripe?”\)\. Moreover, when solving complex problems we use natural language to interleave associative reasoning and formal reasoning, at both the object and meta levels\(Broadbent,[1958](https://arxiv.org/html/2605.11388#bib.bib88); Treisman,[1964](https://arxiv.org/html/2605.11388#bib.bib89); Rosch,[1978](https://arxiv.org/html/2605.11388#bib.bib90)\)\. It is this flexible, structure\-dependent interleaving that enables us to generalize so effectively across tasks\(Stanovich,[2011](https://arxiv.org/html/2605.11388#bib.bib41); Bellini\-Leite,[2022](https://arxiv.org/html/2605.11388#bib.bib42)\)\.
## 3Deep Reasoning – A Formal Language for Meta\-Reasoning
### 3\.1Motivation
LLMs, despite their large context windows, have a limited cognitive capacity\(Zhanget al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib13); Chenet al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib11); Fuet al\.,[2025b](https://arxiv.org/html/2605.11388#bib.bib12)\)\. However, their cognitive capacity is substantially different from that of humans\. For example, they are able to easily memorize and retrieve large collections of text and code while still struggling in basic reasoning tasks\(Maleket al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib8)\)\.
In order to instill strong reasoning and meta\-reasoning capacities in agentic systems, we need to view reasoning processes in terms of the cognitive capacity and abilities of LLMs\. We need to take our intuitive reasoning ability and precisely formalize it and separate it along three dimensions:
1. D1:Associative vs\. Formal – delegating what we can to formal code\.
2. D2:Object vs\. Meta – decomposing object\-level tasks using meta\-reasoning until each associative object\-level task fits within the capacity of an LLMs\.
3. D3:Atomic vs\. Monolithic – decomposing the meta\-reasoning, which is itself cognitively demanding, into composable atomic units that can be delegated to different context threads\.
### 3\.2Formal Notation
In order to precisely formalize reasoning ability along the three dimensions \(D1,D2,D3\) mentioned in §[3\.1](https://arxiv.org/html/2605.11388#S3.SS1), we now introduce the formal language of Deep Reasoning\.
SymbolDescriptionExampleℐ\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\}True Interpreterℐ\(“x = ‘an’;‘S\{x\} Fr\{x\}cisco’”\)=\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\(\}\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}\{\}\\text\{x = \`an'\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\{;\}\}\\ \`S\\\{x\\\} Fr\\\{x\\\}cisco'\}\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}=ℐ\(“The City by the Bay”\)=“San Francisco”\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\(\}\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}\{\}\\text\{The City by the Bay\}\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}=\\text\{\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}San Francisco\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\}𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}Associative Interpreter𝒜\(“The City by the Bay”\)=“San Francisco”\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\(\}\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}\{\}\\text\{ The City by the Bay\}\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\)\}=\\text\{\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}San Francisco\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\}ℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}Formal Interpreterℰ\(“x = ‘an’;‘S\{x\} Fr\{x\}cisco’”\)=\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\(\}\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}\{\}\\text\{ x = \`an'\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\{;\}\}\\ \`S\\\{x\\\} Fr\\\{x\\\}cisco'\}\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\)\}=“San Francisco”ℳℰ\(x\)\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\\mathcal\{M\}\_\{\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\}\(\}x\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}Formal models ofxxw\.r\.t\.ℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}“x = ‘an’;‘S\{x\} Fr\{x\}cisco’”∈ℳℰ\(“San Francisco”\)\\,\\in\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\\mathcal\{M\}\_\{\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\}\(\}\\text\{\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}San Francisco\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}m\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}Formal modeling functionm\(“sum of 3 and 4”\)=“add\(3,4\)”\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\(\\text\{\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}sum of 3 and 4\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\}\)=\\text\{\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}add\(3,4\)\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\}ℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}Deep reasoningℐD\(“The zip code of the City by the Bay”\)=\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}\(\}\\text\{\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}The zip code of the City by the Bay\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}=ℰ\(“y=𝒜\(“City By the Bay”\);ℐD\(“zip code of \{y\}”\)”\)\\hskip 5\.0pt\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\(\}\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}\{\}\\text\{y=$\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\(\}\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}\{\}\\text\{City By the Bay\}\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\)\}$\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\{;\}\}\\hskip 2\.0pt $\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}$\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\(\}\{\\makebox\[5\.55002pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large\`\`\}\}\}zip code of \\\{y\\\}\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}\}\\makebox\[5\.55002pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.54996pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\)\}Table 1:High\-level summary of the language of Deep Reasoning\.#### Symbols and Interpretations\.
We begin with a few basic definitions\. A*symbol*is a word such as “hot” or “dog”, or a mark such as “\+\+” or “&”, that stands for some abstract concept\. Given a set of symbols𝚺\\mathbf\{\\Sigma\}, a*sentence*xxover𝚺\\mathbf\{\\Sigma\}is a sequence of symbolsx∈𝚺\+x\\in\\mathbf\{\\Sigma\}^\{\+\}\.
An*interpretation*functionffis a function that maps sentences to concepts, i\.e\.,f:𝚺\+→𝑪f:\\bm\{\\Sigma\}^\{\+\}\\to\\bm\{C\}, where𝑪\\bm\{C\}is the set of all concepts\. Concepts can be represented in many different ways including sentences, images, and more\. For simplicity, we limit our discussion to concepts represented by sentences\.
We denote byℐ\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\}the ideal*true interpretation*, which maps each sentence to its correct concept\. In practice, we do not have access toℐ\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\}\. For example:
ℐ\(“The City by the Bay”\)\\displaystyle\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{The City by the Bay\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}=ℐ\(“SF, CA”\)=ℐ\(“San Francisco”\)=“San Francisco”\\displaystyle=\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{SF, CA\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}=\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{San Francisco\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}\\;=\\;\\text\{\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}San Francisco\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\}
#### Formal & Associative Interpreters \(D1\)\.
In the language of Deep Reasoning, the distinction between associative and formal reasoning is explicit\. Let𝐋⊂𝚺\+\\mathbf\{L\}\\subset\\mathbf\{\\Sigma\}^\{\+\}be some formal language andℰ:𝐋→𝚺\+\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}:\\mathbf\{L\}\\to\\mathbf\{\\Sigma\}^\{\+\}be a*formal interpreter*that can execute sentences in𝐋\\mathbf\{L\}\. For example,ℰ\(“x=1; x\+2”\)=“3”\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{x=1; x\+2\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\)\}=\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}3\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\. We extendℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}with external functions that can be invoked during execution:
ℰadd\(“add\(3,4\)”\)=“7”ℰllm\(“llm\(’City by the Bay’\)”\)=“San Francisco”\\displaystyle\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\_\{\\text\{add\}\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{add\(3,4\)\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\)\}=\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}7\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\\quad\\quad\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\_\{\\text\{llm\}\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{llm\('City by the Bay'\)\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\)\}=\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{San Francisco\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}On the other hand, we say that𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}is an*associative interpreter*if it uses associative reasoning to assign meaning to sentences\. Examples of associative interpreters include LLMs and other neural networks\. Unlikeℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\},𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}does not require formal sentences but also does not guarantee correctness or internal consistency\. For example, “San Francisco” can be modeled both associatively and formally as follows:
𝒜\(“The City by the Bay”\)=ℰ\(“x = ‘an’;‘S\{x\} Fr\{x\}cisco’”\)=“San Francisco”\\displaystyle\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{The City by the Bay\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\)\}=\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{x = \`an'\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\{;\}\}\\, \`S\\\{x\\\} Fr\\\{x\\\}cisco'\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\)\}=\\text\{\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}San Francisco\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\}
#### Formal Models & Modeling Functions \(D2\)\.
When humans meta\-reason about a task, we create a step\-by\-step decomposition on how to solve it \(Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\)\. Therefore, the language of Deep Reasoning is able to capture such meta\-reasoning processes\. To this end, we define the set of all*formal models*111We distinguish mental models from Machine Learning \(ML\) models, and call the former*models*throughout this paper\.ofxxunderℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}as all formal sentences that, when executed, map to the same underlying concept asxx:
ℳℰ\(x\)=\{y∈𝑳\|ℰ\(y\)=ℐ\(x\)\}\\displaystyle\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\\mathcal\{M\}\_\{\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\}\(\}x\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}=\\bigl\\\{y\\in\\bm\{L\}\\;\\big\|\\;\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\(\}y\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\)\}=\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\}\(\}x\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}\\bigr\\\}Given a formal model ofxxdenoted asy∈ℳℰ\(x\)y\\in\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\\mathcal\{M\}\_\{\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\}\(\}x\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}, we call\(x,y\)\(x,y\)aformal decompositionofxxintoyy\. To make this more concrete, in our running example, the converted human meta\-reasoning trace \(Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\.c\) is a formal decomposition of the task “Which volleyball court in…”\.
When a task is referred to colloquially as a*formal task*, it means that it is trivial to translate the task\-sentence into a formal model\. For example, givenx=x=“Which set of numbers in\{\{6,1,2\},\{6,7\}\}\\\{\\\{6,1,2\\\},\\\{6,7\\\}\\\}has the most elements over 5?”, we can easily model it formally as:
y=“𝒳=\{\{6,1,2\},\{6,7\}\};argmaxS∈𝒳\|\{x∈S∣x\>5\}\|”\\displaystyle y\\,=\\,\\text\{\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\}\\mathcal\{X\}=\\\{\\\{6,1,2\\\},\\\{6,7\\\}\\\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\textbf\{;\}\}\\ \\operatorname\*\{argmax\}\_\{S\\in\\mathcal\{X\}\}\|\\\{x\\in S\\mid x\>5\\\}\|\\text\{\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\}We define amodeling functionasm:𝚺\+→𝑳\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}:\\bm\{\\Sigma\}^\{\+\}\\to\\bm\{L\}such that for everyx∈𝚺\+x\\in\\bm\{\\Sigma\}^\{\+\},m\(x\)∈ℳℰ\(x\)\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\(x\)\\in\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\\mathcal\{M\}\_\{\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\}\(\}x\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}\. In other words, a modeling function takes a sentence and formalizes it into an*equivalent*sentence that can be executed\. Modeling functions can be extended with external functions:
madd\(“the sum of 3 and 4”\)=“add\(3,4\)”\\displaystyle\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\\text\{add\}\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\\text\{the sum of 3 and 4\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}=\\text\{\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}add\(3,4\)\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\}Here, the formal model “add\(3,4\)” is interpretable byℰadd\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\_\{\\text\{add\}\}\. We can consider more abstract functions, such as the associative function𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\. In such cases, the modeling functionm𝒜\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}can formalize sentences using associative functions\. For example,
m𝒜\(“Number of a’s in the City by the Bay”\)=“𝒜\(“The City by the Bay”\)\.count\(‘a’\)”\\displaystyle\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\\text\{Number of a's in the City by the Bay\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}=\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{$\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\\text\{The City by the Bay\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\)\}$\.count\(\`a'\)\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}In this case, our modeling functionm𝒜\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}decided to delegate the disambiguation of “The City by the Bay” to associative reasoning via𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}, after which thecountfunction counts the number of a’s\.
#### Incremental Modeling \(D3\)\.
Up until now all modeling examples assume anm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}capable enough to fully formalize a task in a single step\. However, modeling in itself is a difficult cognitive task\. Therefore we want to capture the idea of*iteratively refining*a sentence until it becomes fully formal\.
To do this, we allow the modeling functionm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}to call itself within a formalization\. This means it can produce intermediate formal sentences that still contain unresolved parts, which are then handled in subsequent modeling steps\. To make this concrete, consider the example:
mm\(“Number of tie\-breaks in the following volleyball scoreboard: 3\-1, …, 2\-3”\)=“s = \{‘3\-1’, …, ‘2\-3’\};re =m\(“regex for volleyball tie\-breaks”\);len\(s\.filter\(re\.match\)\)”\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\\text\{Number of tie\-breaks in the following volleyball scoreboard: 3\-1, \.\.\., 2\-3\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}=\\\\ \\text\{\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}\}\\text\{s = \\\{‘3\-1', \.\.\., ‘2\-3'\\\}\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\textbf\{;\}\}\\,\\,\\text\{re = \}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\(\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\\text\{regex for volleyball tie\-breaks\}\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\textbf\{;\}\}\\,\\,\\text\{len\(s\.filter\(re\.match\)\)\\makebox\[6\.00006pt\]\[r\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large''\}\}\}\{\}\}In this case, coming up with the regex for capturing “volleyball tie\-breaks” might involve looking up the latest official volleyball rules from FIVB222FIVB \(Fédération Internationale de Volleyball\) is the governing body responsible for all forms of Volleyball globally\., building out several regex candidates and verifying them using a programming environment\. This cannot be done reliably by a single associative function which is whymm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\}delegated the task to a subsequent modeling step\.
#### Summary
With the language of Deep Reasoning we can now express delegation between formal and associative tasks \(D1, §[3\.1](https://arxiv.org/html/2605.11388#S3.SS1)\) and even formalize dependencies between associative sub tasks using𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}andℰ𝒜\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\_\{\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}\. We can express meta\-reasoning usingm𝒜\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}to decompose object\-level tasks until the load delegated to an associative call is bellow the cognitive load of a given LLM context thread \(D2, §[3\.1](https://arxiv.org/html/2605.11388#S3.SS1)\)\. Lastly, we can decompose the meta\-reasoning process itself, usingmm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\_\{\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\}, into nested layers of formal and associative meta\-reasoning\. This separates the high cognitive load of meta\-reasoning between different LLM context threads \(D3, §[3\.1](https://arxiv.org/html/2605.11388#S3.SS1)\)\.
## 4Dolores– A Deep Reasoning Agent
We can define specific deep reasoning architectures in terms of the high level meta\-reasoning using Deep Reasoning, separating the architecture from specific implementation decisions\. In this section we present a specific Deep Reasoning agent we callDolores\(Deep Layered Reasoning Scaffold\)\.
### 4\.1Architecture
We recursively defineDolores\(ℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}\) and anatomic modelingfunctionma\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}^\{a\}as follows:
ℐD\(x\)=ℰℐD,𝒜\(mℐD,𝒜a\(x\)\)\\displaystyle\\text\{$\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}\(\}$\}x\\text\{$\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\)\}$\}=\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\_\{\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\\,\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}\\\!\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\bigl\(\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}^\{a\}\_\{\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\\,\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\(\}x\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}\)\}\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\bigr\)\}\(1\)mℐD,𝒜a\(x\)=𝒜\(“decompose\{x\}into a small formal model over\\displaystyle\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}^\{a\}\_\{\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}\(x\)=\\text\{\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}$\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\($\}\\makebox\[6\.00006pt\]\[l\]\{\\smash\{\\raisebox\{\-1\.72218pt\}\{\\large\`\`\}\}\}\{\}decompose $\\\{x\\\}$ into a small formal model over \}ℐD,𝒜\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}using the following atomic decompositions …”\)\)In a nutshell theDoloresℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}takes as input a sentencexxand*recursively*decomposes it and executes it through an atomic modeling function\. Notably, the modeling functionma\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}^\{a\}is equipped with both the associative function𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}and the Deep Reasoning agentℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}itself\.
At each step,ℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}takes a sentence, decomposes it into an atomic formal model usingmℐD,𝒜a\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}^\{a\}\_\{\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}, and executes it viaℰℐD,𝒜\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\_\{\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}, recursively invoking itself on sub\-calls as needed\. In\-context atomic decompositions are fed into the system prompt of𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}, allowing fine\-grained control of the decomposition patterns that the agent will see without additional programming\. This leads to a nested cascade of formal, associative and meta\-reasoning steps \(see Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\) that can be controlled by the user\.
For brevity, in the following sections, we shortenℰℐD,𝒜\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\_\{\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\\,\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}toℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}andmℐD,𝒜a\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}^\{a\}\_\{\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\},\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\}tom\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\.
### 4\.2Implementation
In this section, we describe the implementation details of theDoloresagentℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}, and explain how to build general\-purpose agents using the language of Deep Reasoning\.
#### Instantiation\.
So far, the modeling functionm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}, associative function𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}, and formal interpreterℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}have been conceptual\. To operationalize equation \([4\.1](https://arxiv.org/html/2605.11388#S4.Ex9)\) in theDoloresagent, we instantiate them as follows:
ℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}is Python𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}is an LLMm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}is an LLM
Importantly, as specified in \([4\.1](https://arxiv.org/html/2605.11388#S4.Ex9)\), the formal interpreterℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\(Python\) has access to𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\(LLM\) and theDolores\(ℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}\) itself as external functions\. This allows the agent to recursively call itself inside the environment \(Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\)\. Depending on the task, the Python environment can also include additional tools and variables\.
The modeling functionm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}, implemented as an LLM, is responsible for decomposing tasks into a series of associative steps \(calls to𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\), formal steps \(code executed byℰ\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}\\mathcal\{E\}\}\), and meta\-reasoning steps \(recursive calls toℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}\)\. To guide this behavior, we design prompts for the modeling LLM \(m\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\) withatomicdecompositions as in\-context examples\. These atomic decompositions are built by first formalizing human meta\-reasoning traces to the language of Deep Reasoning and then converting them into in\-context examples\.
#### Human Meta\-reasoning Traces\.
We start from verbalized human meta\-reasoning traces for specific tasks\. For example, in the volleyball court task “Which volleyball court in …”, humans have an intuitive ability to meta\-reason and decompose the task, as illustrated in Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\. These traces are directly mapped in the language of Deep Reasoning \(Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\.b\), where different parts of reasoning map to different interpretation functions\. For instance, resolving “City by the Bay” to San Francisco is handled associatively via𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}\. The step of “I will list all the volleyball courts in SF” is expressed as a recursive call toℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}, and “pick the court with the most tie\-breaks” is a simple formal operation \(argmax\)\. Basically, this formalized human meta\-reasoning*defines*how the task should be decomposed by the modeling LLM \(m\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\)\.
#### In\-Context Examples\.
Given these formalized traces, we convert them directly into in\-context examples\. Associative calls𝒜\{\\color\[rgb\]\{1,0\.04,0\.61\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{1,0\.04,0\.61\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.96\}\{0\.39\}\{0\}\\pgfsys@color@cmyk@fill\{0\}\{0\.96\}\{0\.39\}\{0\}\\mathcal\{A\}\}and Deep Reasoning callsℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}are implemented as LLM calls and recursive sub\-agent calls, respectively\. Figure[1](https://arxiv.org/html/2605.11388#S1.F1)\.c shows this concretely: the code corresponds to a single in\-context example that specifies how to decompose the “Which volleyball court in …” task based on the formalized human meta\-reasoning trace \(see §[B](https://arxiv.org/html/2605.11388#A2)for more details\)\. This example is then provided to the modeling LLM \(m\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\)\. More generally, additional in\-context examples derived from human meta\-reasoning on different tasks \(e\.g\., deep web search, long\-context reasoning, or multi\-hop QA\) can be added in the same way, allowing the modeling LLM \(m\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}\) to generalize decomposition strategies across tasks\.
#### Agentic Loop\.
Finally, we implement the agent inside a Python Read\-Eval\-Print Loop \(REPL\), similar to a notebook environment\. Each step consists of a short chain\-of\-thought fromm\{\\color\[rgb\]\{0\.08203125,0\.4296875,0\.17578125\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.08203125,0\.4296875,0\.17578125\}m\}, followed by a formal code block that is executed\. The REPL allows the agent to store, inspect, and reuse variables across steps\. Recursive calls toDolores\(ℐD\{\\color\[rgb\]\{0\.55,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.55,0,0\}\\pgfsys@color@cmyk@stroke\{0\}\{0\.72\}\{1\}\{0\.45\}\\pgfsys@color@cmyk@fill\{0\}\{0\.72\}\{1\}\{0\.45\}\\mathcal\{I\}\_\{\\mathrm\{D\}\}\}\) run in separate REPL environments\.
## 5Experiments
BenchmarkMethodReActCodeActDeepRLMDoloresResearch\([2022](https://arxiv.org/html/2605.11388#bib.bib46)\)\([2024](https://arxiv.org/html/2605.11388#bib.bib47)\)\([2025b](https://arxiv.org/html/2605.11388#bib.bib54)\)\([2025](https://arxiv.org/html/2605.11388#bib.bib49)\)\(ours\)Qwen3 8BThinkingSynthWorlds\([2026](https://arxiv.org/html/2605.11388#bib.bib52)\)0\.1760\.2680\.2060\.0580\.305PhantomWiki\([2025](https://arxiv.org/html/2605.11388#bib.bib53)\)0\.1440\.1530\.1200\.0430\.172DeepSearchQA\([2026](https://arxiv.org/html/2605.11388#bib.bib57)\)0\.1240\.1490\.1420\.1390\.161Oolong\-real\([2025](https://arxiv.org/html/2605.11388#bib.bib51)\)NA†0\.045NA†0\.0650\.076Qwen3 32BThinkingSynthWorlds\([2026](https://arxiv.org/html/2605.11388#bib.bib52)\)0\.2280\.2810\.2750\.1690\.346PhantomWiki\([2025](https://arxiv.org/html/2605.11388#bib.bib53)\)0\.1670\.2520\.2120\.1600\.369DeepSearchQA\([2026](https://arxiv.org/html/2605.11388#bib.bib57)\)0\.1770\.2130\.2300\.1900\.241Oolong\-real\([2025](https://arxiv.org/html/2605.11388#bib.bib51)\)NA†0\.060NA†0\.0770\.132Llama\-3\.370B InstructSynthWorlds\([2026](https://arxiv.org/html/2605.11388#bib.bib52)\)0\.3000\.4800\.3080\.4010\.359PhantomWiki\([2025](https://arxiv.org/html/2605.11388#bib.bib53)\)0\.2440\.3810\.1600\.2700\.512DeepSearchQA\([2026](https://arxiv.org/html/2605.11388#bib.bib57)\)0\.1550\.1840\.1270\.1750\.187Oolong\-real\([2025](https://arxiv.org/html/2605.11388#bib.bib51)\)NA†0\.079NA†0\.0640\.151Table 2:Scores \(0–1,↑\\uparrowhigher is better\) across four reasoning benchmarks\. NA†indicates context size limitations\.Doloresoutperforms state\-of\-the\-art methods across all benchmarks by an average of 24\.8% over the best\-performing baseline\.### 5\.1Setup
To empirically evaluate the utility of Deep Reasoning in agentic reasoning tasks, we take four difficult reasoning benchmarks with verifiable answer that test different reasoning types\.
#### Benchmarks\.
The benchmarks include:1SynthWorlds\(Guet al\.,[2026](https://arxiv.org/html/2605.11388#bib.bib52)\)tests multi\-hop reasoning over synthetic knowledge\-graph–derived documents\. This benchmark can isolate reasoning from memorization, requiring models to not rely solely on parametric knowledge and test if they ground on external knowledge reliably\.2PhantomWikiGonget al\.\([2025](https://arxiv.org/html/2605.11388#bib.bib53)\)tests multi\-hop QA over a synthetic universe of configurable size\. This benchmark tests how models track intermediate states across long reasoning chains\.3DeepSearchQA\(Guptaet al\.,[2026](https://arxiv.org/html/2605.11388#bib.bib57)\)is a verifiable deep research benchmark\. It has complex, multi\-step information\-seeking tasks that test an agent’s ability to gather, filter and organize information\.4OOlong\-\(real\)Bertschet al\.\([2025](https://arxiv.org/html/2605.11388#bib.bib51)\)which tests multi\-step information aggregation over very long real\-world documents \(e\.g\., episode transcripts, news archives\)\. This benchmark tests how agents deal with documents that go beyond a base model’s context window size\.
#### Baselines\.
We compare our method against four open source baselines:1A ReAct\(Yaoet al\.,[2022](https://arxiv.org/html/2605.11388#bib.bib46)\)implementation fromRoucheret al\.\([2025a](https://arxiv.org/html/2605.11388#bib.bib56)\)\.2A CodeAct\(Wanget al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib47)\)implementation fromRoucheret al\.\([2025a](https://arxiv.org/html/2605.11388#bib.bib56)\)\.3A Deep Research agent from\(Roucheret al\.,[2025b](https://arxiv.org/html/2605.11388#bib.bib54)\)\.4The RLM agent from the paperZhanget al\.\([2025](https://arxiv.org/html/2605.11388#bib.bib49)\)\.
#### Models\.
We test the baselines and our proposed method on the benchmarks using 3 different models from 2 model families: Qwen3\-8B Thinking, and Qwen3\-32B Thinking\(Yanget al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib103)\), Llama\-3\.3 70B Instruct\(Dubeyet al\.,[2024](https://arxiv.org/html/2605.11388#bib.bib104)\)\. See §LABEL:apn:exp\_detailsfor more details\.
### 5\.2Results and Analyses
#### Doloresoutperforms baselines across different benchmarks\.
As shown in Table[2](https://arxiv.org/html/2605.11388#S5.T2),Doloresoutperforms all evaluated scaffolds in11/1211/12settings, with average improvements of 36\.4% \(Qwen3\-32B\), 12\.8% \(Qwen3\-8B\), and 25\.4% \(Llama\-3\.3\-70B\) over the strongest baseline on each task\. An exception is CodeAct outperformingDoloreson SynthWorlds with Llama\-3\.3\-70B\. Notably,Doloresconsistently punches above its model class, with the 8B model outperforming the best evaluated 32B baseline on SynthWorlds and Oolong\-real, and the 32B model outperforming the best evaluated 70B baseline on PhantomWiki\.
#### Reducing cognitive load mitigates premature termination and hallucination\.
To better understand the gains ofDolores, we perform a multi\-label topic\-based failure analysis on Qwen3\-32B traces fromDoloresand the evaluated baselines \(see §LABEL:apn:failure\_modefor details\)\. The two dominant failure modes arepremature termination\(78%\) andhallucination\(45%\)\. For premature termination, we observe ReAct and CodeAct attempting to resolve answers in one or two hops on SynthWorlds and PhantomWiki, often giving up when these high\-cognitive\-load reasoning steps fail\. For hallucination, we observe RLMs delegating large counting tasks to sub\-LLMs that count unreliably, while both RLMs and Deep Research fabricate mock tool outputs and entities in SynthWorlds and PhantomWiki\. Manual inspection of traces shows thatDoloresavoids these pitfalls by decomposing the same tasks in more atomic ways, leading to correct solutions in cases where other scaffolds fail\. We provide qualitative trace comparisons in §LABEL:apn:qualitative\_examples\. Analyzing both total and per\-thread token counts reveals a clear pattern\. As expected,Doloresspawns many reasoning threads, resulting in 12\.9×\\timesmore tokens on average\. However, on a per\-thread basis,Doloresreduces reasoning tokens by an average of71%and non\-reasoning tokens by76%\. These results are consistent with our hypothesis that reducing cognitive load mitigates premature termination and hallucination\. While these gains currently come at a high token cost, we hypothesize that1the high overlap of system prompts across many LLM threads,2the ability ofDoloresto bridge scaling tiers by using cheaper models, and3selecting models based on the cognitive load of associative tasks, could eventually lead to lower parameter\-adjusted token costs\.
#### Deep Reasoning performance is enabled by in\-context decomposition examples\.
A natural question is whetherDolores’ improvements come from the structured meta\-reasoning in\-context examples themselves, or simply from good engineering\. A follow\-up question is whether describing in NL of how toatomicallydecompose is sufficient for current LLMs to operationalize it effectively\. We ablate this on Qwen3\-32B by running two reduced variants\. The first,no\-examples, removes all decomposition examples from the system prompt\. The second,with\-principles, also removes the decomposition examples, but replaces them with NL instructions for decomposing tasks along the three dimensions listed in §[3\.1](https://arxiv.org/html/2605.11388#S3.SS1)\. The full ablation table is in Appendix TableLABEL:tab:ablation\-examples\. We see that removing the in\-context examples causes large drops on every benchmark \(70% on average\)\. Interestingly,with\-principlesperforms even worse thanno\-exampleson every benchmark \(0\.117 vs\. 0\.139 on DeepSearchQA; 0\.080 vs\. 0\.113 on PhantomWiki; 0\.036 vs\. 0\.041 on SynthWorlds; 0\.033 vs\. 0\.036 on Oolong\)\. These results suggest that current LLMs are not good at operationalizing structured meta\-reasoning and require humans to meta\-reason about task decompositions\. LLMs appear to be reducing the task of structurally decomposing a task to pattern matching\. We discuss Limitations and future work in §[A](https://arxiv.org/html/2605.11388#A1)\.
## 6Related work
#### Train Time Scaffold Generation\.
Concurrent work\(Leeet al\.,[2026](https://arxiv.org/html/2605.11388#bib.bib16); Haoet al\.,[2026](https://arxiv.org/html/2605.11388#bib.bib4)\)build novel scaffolds automatically by using an outer training loop that uses LLMs to search over the space of scaffolds during training, outputting scaffolds that outperform manually curated scaffolds on the tasks for which they were trained\. Train\-time scaffold generation is a promising direction, but must contend with data collection and continual retraining costs as out\-of\-distribution tasks come up\.Dolorestries to avoid this cost by in\-context learning from a small number of atomic decompositions\.
#### Test time Scaffolds Generation\.
Continual\-learning scaffold generators adapt over time by performing additional inference\-time LLM calls to reflect, summarize, and store experience between sessions, using this experience to improve subsequent scaffold generations per task\(Liuet al\.,[2026](https://arxiv.org/html/2605.11388#bib.bib2); Xiaet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib1)\)\. The work ofLiuet al\.\([2026](https://arxiv.org/html/2605.11388#bib.bib2)\)learns to decompose tasks into differentmulti agent interaction patternswhich are then supervised and adapted by a watcher agent which interacts with an experience pool\.\(Xiaet al\.,[2025](https://arxiv.org/html/2605.11388#bib.bib1)\)focus on coding tasks, starting with a mini\-SWE\-agent that only has bash tools, and continually expands the set of tools the agent has through experience, without modifying the agentic loop\. Continual learning of decompositions is a promising direction complementary to our work\. While both works learn from examples well, it is unclear how to inject human expertise into this process\.Doloresdirectly translates human reasoning using the Deep Reasoning language, which makes it easily controllable via prompting with a small set of in\-context examples\. Moreover, instead of dynamically varying tools and workflow patterns between tasks,Doloresalso varies the scaffold during task execution, adapting to changes in the intermediate state of the task progression\.
## 7Conclusions
Humans solve complex tasks by planning, executing, revising intermediate goals, resolving ambiguity and switching between associative and formal reasoning\. This flexible just\-in\-time meta\-reasoning is what current LLM scaffolds lack, leaving them brittle against novel tasks\. This work operationalizes human meta\-reasoning in general purpose agents through Deep Reasoning, a formal language for structured meta\-reasoning\. Through Deep Reasoning we constructDolores, an agentic scaffold that evolves just\-in\-time based on the specific task and intermediate reasoning steps\. Using single\-digit atomic in\-context examples, based on human intuition,Doloressignificantly outperforms strong baseline scaffolds on a collection of hard reasoning tasks\. Under the Deep Reasoning paradigm, LLMs should not be viewed as a collection of shallow experts, to be managed by scaffolding glue, but associative reasoning processes that flexibly interact with formal processes to create artificial minds that can meta\-reason robustly\. Even a basic implementation of Deep Reasoning can bridge the scaling gap \(8B vs 32B\)\.
## 8Acknowledgements
This research was developed with funding from the Defense Advanced Research Projects Agency’s \(DARPA\) SciFy program \(Agreement No\. HR00112520300\)\. This material is based upon work supported in part by the Defense Advanced Research Projects Agency and the Air Force Research Laboratory, contract number\(s\): FA8650\-23\-C\-7316\. The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U\.S\. Government\. This research was also supported by the Coefficient Giving, Amazon Health, Meta AIM program\. This research was also supported by NSF III 2507117 and NSF IIS 2314527\. This work was also supported in part by the U\.S\. National Science Foundation \(NSF\) CAREER Award 2337877, Schmidt Sciences Award on AI & Advanced Computing, through the Science of Trustworthy AI program, and by the University of Washington Tech Policy Lab\. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of NSF or Schmidt Sciences\.
## References
- R\. Ackerman and V\. A\. Thompson \(2017\)Meta\-reasoning: monitoring and control of thinking and reasoning\.Trends in cognitive sciences21\(8\),pp\. 607–617\.Cited by:[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px2.p1.1)\.
- I\. Arcuschin, J\. Janiak, R\. Krzyzanowski, S\. Rajamanoharan, N\. Nanda, and A\. Conmy \(2025\)Chain\-of\-thought reasoning in the wild is not always faithful\.arXiv preprint arXiv:2503\.08679\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1)\.
- S\. C\. Bellini\-Leite \(2022\)Dual process theory: embodied and predictive; symbolic and classical\.Frontiers in Psychology13,pp\. 805386\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- A\. Bertsch, A\. Pratapa, T\. Mitamura, G\. Neubig, and M\. R\. Gormley \(2025\)Oolong: evaluating long context reasoning and aggregation capabilities\.arXiv preprint arXiv:2511\.02817\.Cited by:[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px1.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.3.3.3),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.5.5.3),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.7.3)\.
- D\. E\. Broadbent \(1958\)Perception and communication\.Pergamon Press,London\.Cited by:[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- Q\. Chen, L\. Qin, J\. Wang, J\. Zhou, and W\. Che \(2024\)Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain\-of\-thought\.InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 \- 15, 2024,A\. Globersons, L\. Mackey, D\. Belgrave, A\. Fan, U\. Paquet, J\. M\. Tomczak, and C\. Zhang \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2024/hash/62ab1c2cb4b03e717005479efb211841-Abstract-Conference.html)Cited by:[§3\.1](https://arxiv.org/html/2605.11388#S3.SS1.p1.1)\.
- X\. Chen, J\. Xu, T\. Liang, Z\. He, J\. Pang, D\. Yu, L\. Song, Q\. Liu, M\. Zhou, Z\. Zhang,et al\.\(2025\)Do not think that much for 2\+ 3=? on the overthinking of long reasoning models\.InForty\-second International Conference on Machine Learning,Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1),[§1](https://arxiv.org/html/2605.11388#S1.p3.1)\.
- A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Yang, A\. Fan,et al\.\(2024\)The llama 3 herd of models\.arXiv e\-prints,pp\. arXiv–2407\.Cited by:[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px3.p1.1)\.
- J\. H\. Flavell \(1979\)Metacognition and cognitive monitoring: a new area of cognitive–developmental inquiry\.\.American psychologist34\(10\),pp\. 906\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px2.p1.1)\.
- D\. Fu, K\. He, Y\. Wang, W\. Hong, Z\. GongQue, W\. Zeng, W\. Wang, J\. Wang, X\. Cai, and W\. Xu \(2025a\)AgentRefine: enhancing agent generalization through refinement tuning\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=FDimWzmcWn)Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p3.1)\.
- Q\. Fu, Y\. Qin, R\. Huang, Y\. Chen, Y\. Zhou, and L\. Long \(2025b\)Exclusion of thought: mitigating cognitive load in large language models for enhanced reasoning in multiple\-choice tasks\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\), ACL 2025, Vienna, Austria, July 27 \- August 1, 2025,W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),pp\. 21673–21686\.External Links:[Link](https://doi.org/10.18653/v1/2025.acl-long.1051),[Document](https://dx.doi.org/10.18653/V1/2025.ACL-LONG.1051)Cited by:[§3\.1](https://arxiv.org/html/2605.11388#S3.SS1.p1.1)\.
- A\. Gong, K\. Stankevičiūtė, C\. Wan, A\. Kabra, R\. Thesmar, J\. Lee, J\. Klenke, C\. P\. Gomes, and K\. Q\. Weinberger \(2025\)PhantomWiki: on\-demand datasets for reasoning and retrieval evaluation\.InInternational Conference on Machine Learning,pp\. 19964–19995\.Cited by:[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px1.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.11.1),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.14.1),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.17.1)\.
- T\. L\. Griffiths, F\. Callaway, M\. B\. Chang, E\. Grant, P\. M\. Krueger, and F\. Lieder \(2019\)Doing more with less: meta\-reasoning and meta\-learning in humans and machines\.Current Opinion in Behavioral Sciences29,pp\. 24–30\.Note:Artificial IntelligenceExternal Links:ISSN 2352\-1546,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.cobeha.2019.01.005),[Link](https://www.sciencedirect.com/science/article/pii/S2352154618302122)Cited by:[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- K\. Gu, A\. Bhat, M\. A\. Merrill, R\. West, X\. Liu, D\. McDuff, and T\. Althoff \(2026\)SynthWorlds: controlled parallel worlds for disentangling reasoning and knowledge in language models\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=46AQ4qaWqQ)Cited by:[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px1.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.10.2),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.13.2),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.16.2)\.
- N\. Gupta, R\. Chatterjee, L\. Haas, C\. Tao, A\. Wang, C\. Liu, H\. Oiwa, E\. Gribovskaya, J\. Ackermann, J\. Blitzer,et al\.\(2026\)DeepSearchQA: bridging the comprehensiveness gap for deep research agents\.arXiv preprint arXiv:2601\.20975\.Cited by:[Figure 1](https://arxiv.org/html/2605.11388#S1.F1),[Figure 1](https://arxiv.org/html/2605.11388#S1.F1.9.2),[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px1.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.12.1),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.15.1),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.18.1)\.
- Z\. Hao, H\. Wang, J\. Luo, J\. Zhang, Y\. Zhou, Q\. Lin, C\. Wang, H\. Dong, and J\. Chen \(2026\)ReCreate: reasoning and creating domain agents driven by experience\.CoRRabs/2601\.11100\.External Links:[Link](https://doi.org/10.48550/arXiv.2601.11100),[Document](https://dx.doi.org/10.48550/ARXIV.2601.11100),2601\.11100Cited by:[§6](https://arxiv.org/html/2605.11388#S6.SS0.SSS0.Px1.p1.1)\.
- M\. Hassid, G\. Synnaeve, Y\. Adi, and R\. Schwartz \(2025\)Don’t overthink it\. preferring shorter thinking chains for improved llm reasoning\.arXiv preprint arXiv:2505\.17813\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1)\.
- A\. Jaech, A\. Kalai, A\. Lerer, A\. Richardson, A\. El\-Kishky, A\. Low, A\. Helyar, A\. Madry, A\. Beutel, A\. Carney,et al\.\(2024\)Openai o1 system card\.arXiv preprint arXiv:2412\.16720\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1)\.
- P\. N\. Johnson\-Laird and R\. M\. Byrne \(1991\)Deduction\.\.Lawrence Erlbaum Associates, Inc\.Cited by:[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px1.p1.1)\.
- P\. Kargupta, S\. S\. Li, H\. Wang, J\. Lee, S\. Chen, O\. Ahia, D\. Light, T\. L\. Griffiths, M\. Kleiman\-Weiner, J\. Han,et al\.\(2025\)Cognitive foundations for reasoning and their manifestation in llms\.arXiv preprint arXiv:2511\.16660\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1),[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- T\. Lanham, A\. Chen, A\. Radhakrishnan, B\. Steiner, C\. Denison, D\. Hernandez, D\. Li, E\. Durmus, E\. Hubinger, J\. Kernion,et al\.\(2023\)Measuring faithfulness in chain\-of\-thought reasoning\.arXiv preprint arXiv:2307\.13702\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1),[§1](https://arxiv.org/html/2605.11388#S1.p3.1)\.
- Y\. Lee, R\. Nair, Q\. Zhang, K\. Lee, O\. Khattab, and C\. Finn \(2026\)Meta\-harness: end\-to\-end optimization of model harnesses\.CoRRabs/2603\.28052\.External Links:[Link](https://doi.org/10.48550/arXiv.2603.28052),[Document](https://dx.doi.org/10.48550/ARXIV.2603.28052),2603\.28052Cited by:[§6](https://arxiv.org/html/2605.11388#S6.SS0.SSS0.Px1.p1.1)\.
- G\. Liu, H\. Lin, H\. Zeng, H\. Wang, and Q\. Yao \(2026\)MAS\-on\-the\-fly: dynamic adaptation of llm\-based multi\-agent systems at test time\.CoRRabs/2602\.13671\.External Links:[Link](https://doi.org/10.48550/arXiv.2602.13671),[Document](https://dx.doi.org/10.48550/ARXIV.2602.13671),2602\.13671Cited by:[§6](https://arxiv.org/html/2605.11388#S6.SS0.SSS0.Px2.p1.1)\.
- Q\. Lyu, S\. Havaldar, A\. Stein, L\. Zhang, D\. Rao, E\. Wong, M\. Apidianaki, and C\. Callison\-Burch \(2023\)Faithful chain\-of\-thought reasoning\.InProceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia\-Pacific Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 305–329\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1)\.
- A\. Malek, J\. Ge, N\. Lazic, C\. Jin, A\. György, and C\. Szepesvári \(2025\)Frontier llms still struggle with simple reasoning tasks\.CoRRabs/2507\.07313\.External Links:[Link](https://doi.org/10.48550/arXiv.2507.07313),[Document](https://dx.doi.org/10.48550/ARXIV.2507.07313),2507\.07313Cited by:[§3\.1](https://arxiv.org/html/2605.11388#S3.SS1.p1.1)\.
- S\. Mednick \(1962\)The associative basis of the creative process\.\.Psychological review69\(3\),pp\. 220\.Cited by:[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px1.p1.1)\.
- A\. Newell, J\. C\. Shaw, and H\. A\. Simon \(1959\)Report on a general problem solving program\.InIFIP congress,Vol\.256,pp\. 1959\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px2.p1.1)\.
- L\. J\. Rips \(1983\)Cognitive processes in propositional reasoning\.Psychol\. Rev\.90\(1\),pp\. 38–71\(en\)\.Cited by:[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px1.p1.1)\.
- E\. Rosch \(1978\)Principles of categorization\.InCognition and Categorization,E\. Rosch and B\. B\. Lloyd \(Eds\.\),pp\. 27–48\.Cited by:[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- J\. Rosser and J\. N\. Foerster \(2026\)AgentBreeder: mitigating the AI safety risks of multi\-agent scaffolds via self\-improvement\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=mlU9KqdZUS)Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p3.1)\.
- A\. Roucher, A\. V\. del Moral, T\. Wolf, L\. von Werra, and E\. Kaunismäki \(2025a\)‘Smolagents‘: a smol library to build great agentic systems\.\.Note:[https://github\.com/huggingface/smolagents](https://github.com/huggingface/smolagents)Cited by:[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px2.p1.4)\.
- A\. Roucher, A\. Villanova del Moral, Merve, T\. Wolf, and C\. Fourrier \(2025b\)Open\-source deepresearch – freeing our search agents\.Note:Accessed: 2026\-04\-13External Links:[Link](https://huggingface.co/blog/open-deep-research)Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p3.1),[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px2.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.9.3.1)\.
- S\. A\. Sloman \(1996\)The empirical case for two systems of reasoning\.\.Psychological bulletin119\(1\),pp\. 3\.Cited by:[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px1.p1.1)\.
- K\. Stanovich \(2011\)Rationality and the reflective mind\.Oxford University Press\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p1.1),[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- J\. Sweller, J\. J\. G\. van Merriënboer, and F\. Paas \(2019\)Cognitive architecture and instructional design: 20 years later\.Educational Psychology Review31\(2\),pp\. 261–292\.External Links:ISSN 1573\-336X,[Link](http://dx.doi.org/10.1007/s10648-019-09465-5),[Document](https://dx.doi.org/10.1007/s10648-019-09465-5)Cited by:[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- A\. Treisman \(1964\)Monitoring and storage of irrelevant messages in selective attention\.Journal of Verbal Learning and Verbal Behavior3\(6\),pp\. 449–459\.Cited by:[§2\.2](https://arxiv.org/html/2605.11388#S2.SS2.p1.1)\.
- X\. Wang, Y\. Chen, L\. Yuan, Y\. Zhang, Y\. Li, H\. Peng, and H\. Ji \(2024\)Executable code actions elicit better llm agents\.InForty\-first International Conference on Machine Learning,Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p3.1),[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px2.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.9.2.1)\.
- P\. C\. Wason \(1968\)Reasoning about a rule\.Quarterly journal of experimental psychology20\(3\),pp\. 273–281\.Cited by:[§2\.1](https://arxiv.org/html/2605.11388#S2.SS1.SSS0.Px1.p1.1)\.
- C\. S\. Xia, Z\. Wang, Y\. Yang, Y\. Wei, and L\. Zhang \(2025\)Live\-swe\-agent: can software engineering agents self\-evolve on the fly?\.CoRRabs/2511\.13646\.External Links:[Link](https://doi.org/10.48550/arXiv.2511.13646),[Document](https://dx.doi.org/10.48550/ARXIV.2511.13646),2511\.13646Cited by:[§6](https://arxiv.org/html/2605.11388#S6.SS0.SSS0.Px2.p1.1)\.
- A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv,et al\.\(2025\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px3.p1.1)\.
- S\. Yao, J\. Zhao, D\. Yu, N\. Du, I\. Shafran, K\. R\. Narasimhan, and Y\. Cao \(2022\)React: synergizing reasoning and acting in language models\.InThe eleventh international conference on learning representations,Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p3.1),[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px2.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.9.1.1)\.
- E\. Yee, A\. Li, C\. Tang, Y\. H\. Jung, R\. Paturi, and L\. Bergen \(2024\)Dissociation of faithful and unfaithful reasoning in llms\.arXiv preprint arXiv:2405\.15092\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p2.1),[§1](https://arxiv.org/html/2605.11388#S1.p3.1)\.
- A\. L\. Zhang, T\. Kraska, and O\. Khattab \(2025\)Recursive language models\.arXiv preprint arXiv:2512\.24601\.Cited by:[§1](https://arxiv.org/html/2605.11388#S1.p3.1),[§5\.1](https://arxiv.org/html/2605.11388#S5.SS1.SSS0.Px2.p1.4),[Table 2](https://arxiv.org/html/2605.11388#S5.T2.7.9.4.1)\.
- C\. Zhang, Y\. Jian, Z\. Ouyang, and S\. Vosoughi \(2024\)Working memory identifies reasoning limits in language models\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12\-16, 2024,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),pp\. 16896–16922\.External Links:[Link](https://doi.org/10.18653/v1/2024.emnlp-main.938),[Document](https://dx.doi.org/10.18653/V1/2024.EMNLP-MAIN.938)Cited by:[§3\.1](https://arxiv.org/html/2605.11388#S3.SS1.p1.1)\.
## Appendix ALimitations, Future Work and Impact
#### Limitations and Future Work
Creating good Deep Reasoning decompositions requires good understanding of the Deep Reasoning language, the abilities of modern LLMs and agentic loop structures\. This might present a barrier of entry for experts in other domains for writing decompositions for their use cases\. Additionally, complex tasks in domains like scientific or medical reasoning might require a lot of atomic decompositions, which could overwhelm a single memory thread\. Future work can include methods for1handling large collections of decompositions,2reducing token count and cost of Deep Reasoning agents,3sourcing decompositions automatically from natural language feedback and textbooks,4combining Deep Reasoning with continual learning from examples and6extend the Language of Deep Reasoning to capture subjective or ambiguous sentences\.
#### Impact
Doloreshas the potential to lower the barrier for domain experts to build reliable reasoning agents in fields such as science, medicine, and law by encoding expertise as a handful of in\-context decompositions rather than as training data or predefined scaffolds\. However, it still inherits the standard risks of LLM agents, as they are a component of the system\. This can still include hallucination and confidently wrong intermediate steps, and its higher token cost concentrates capability in actors with large compute budgets\. We mitigate these risks by evaluating only on public benchmarks, releasing all decomposition examples and traces, and recommending human\-in\-the\-loop deployment for high\-stakes use\.
## Appendix BDecompositions as Prompts
In this we show the some of the decompositions generated from the running example in figures[1](https://arxiv.org/html/2605.11388#S1.F1)andLABEL:fig:low\-overview\. Each few shot example shows a decomposition and comprises of a name\-space \(used to filter the set of in\-context examples rendered to the system prompt of aDoloresthread\), a task and a series of "shallow reasoning" blocks, formal reasoning blocks and observation blocks\. In order to keep the meta\-reasoning repl blocks grounded on intermediate results,Doloresinspects intermediate results as they execute to make sure that the model is executing as expected\.Similar Articles
GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
This paper introduces GraphDC, a divide-and-conquer multi-agent framework that decomposes graph algorithmic tasks into subgraphs for specialized agents, improving scalability and reasoning performance on complex graph structures.
Mind DeepResearch Technical Report
MindDR is a multi-agent deep research framework using a three-agent architecture (Planning, DeepSearch, Report) and a four-stage training pipeline, achieving competitive performance with ~30B-parameter models on multiple benchmarks. Developed by Li Auto and deployed as an online product, it also introduces MindDR Bench, a 500-query Chinese benchmark for evaluating deep research capabilities.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
DeepRefine is a research paper introducing an LLM-based reasoning model that refines agent-compiled knowledge bases using reinforcement learning and multi-turn interactions to improve downstream task performance.
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play
STRATAGEM is a new framework for improving reasoning transferability in language models by using game self-play with a Reasoning Transferability Coefficient and Reasoning Evolution Reward to reinforce abstract, domain-agnostic reasoning patterns over game-specific heuristics. Experiments show strong improvements on mathematical reasoning, general reasoning, and code generation benchmarks.
@tom_doerr: Trains deep search agents from knowledge graphs https://github.com/THUDM/DeepDive
DeepDive presents an automated approach to training deep search agents using knowledge graphs for data synthesis and multi-turn reinforcement learning, enabling complex multi-step reasoning and web browsing.