Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving
Summary
Xi’an Jiaotong University researchers propose DCM-Agent, a training-free LLM agent that organizes past optimization solutions into dual clusters to resolve structural ambiguity and boost accuracy 11-21% across benchmarks.
View Cached Full Text
Cached at: 04/23/26, 10:03 AM
# Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving
Source: [https://arxiv.org/html/2604.20183](https://arxiv.org/html/2604.20183)
Xinyu Zhang1,2, Yuchen Wan1,2∗, Boxuan Zhang1,2, Zesheng Yang1,2, Lingling Zhang1,2,Bifan Wei1,2,Jun Liu1,3 1School of Computer Science and Technology, Xi’an Jiaotong University 2Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China 3Shaanxi Province Key Laboratory of Big Data Knowledge Engineering, China zhang1393869716@stu\.xjtu\.edu\.cn, \{zhanglling,liukeen\}@xjtu\.edu\.cn
###### Abstract
Large Language Models \(LLMs\) often struggle with structural ambiguity in optimization problems, where a single problem admits multiple related but conflicting modeling paradigms, hindering effective solution generation\. To address this, we proposeDual\-Cluster Memory Agent \(DCM\-Agent\)to enhance performance by leveraging historical solutions in a training\-free manner\. Central to this is Dual\-Cluster Memory Construction\. This agent assigns historical solutions to modeling and coding clusters, then distills each cluster’s content into three structured types:Approach,Checklist, andPitfall\. This process derives generalizable guidance knowledge\. Furthermore, this agent introducesMemory\-augmented Inferenceto dynamically navigate solution paths, detect and repair errors, and adaptively switch reasoning paths with structured knowledge\. The experiments across seven optimization benchmarks demonstrate that DCM\-Agent achieves an average performance improvement of 11%\- 21%\. Notably, our analysis reveals a “knowledge inheritance” phenomenon: memory constructed by larger models can guide smaller models toward superior performance, highlighting the framework’s scalability and efficiency\.
Dual\-Cluster Memory Agent: Resolving Multi\-Paradigm Ambiguity in Optimization Problem Solving
Xinyu Zhang1,2††thanks:These authors contributed equally to this work\., Yuchen Wan1,2∗, Boxuan Zhang1,2, Zesheng Yang1,2,Lingling Zhang1,2††thanks:Corresponding author,Bifan Wei1,2,Jun Liu1,31School of Computer Science and Technology, Xi’an Jiaotong University2Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China3Shaanxi Province Key Laboratory of Big Data Knowledge Engineering, Chinazhang1393869716@stu\.xjtu\.edu\.cn, \{zhanglling,liukeen\}@xjtu\.edu\.cn
## 1Introduction
Optimization problems underpin operations research, supporting applications from supply chain logistics to economic forecastingLiuet al\.\([2025b](https://arxiv.org/html/2604.20183#bib.bib18),[2023](https://arxiv.org/html/2604.20183#bib.bib46)\); Belilet al\.\([2018](https://arxiv.org/html/2604.20183#bib.bib43)\)\. Traditionally, solving these problems requires an intensive processMeerschaert \([2013](https://arxiv.org/html/2604.20183#bib.bib47)\), where domain experts manually translate textual descriptions into formulations\. Recently, Large Language Models \(LLMs\) have demonstrated remarkable reasoning capabilities across diverse domains, including science reasoningZhanget al\.\([2025a](https://arxiv.org/html/2604.20183#bib.bib1),[d](https://arxiv.org/html/2604.20183#bib.bib3),[e](https://arxiv.org/html/2604.20183#bib.bib5)\), cognitive reasoningZhanget al\.\([2025c](https://arxiv.org/html/2604.20183#bib.bib6)\), and temporal analysisZhanget al\.\([2022](https://arxiv.org/html/2604.20183#bib.bib7)\)\. These advances have also reshaped optimization modeling, leveraging domain knowledge to automate the conversion of textual descriptions into optimization models, thereby mitigating dependency on human expertiseSinhaet al\.\([2025](https://arxiv.org/html/2604.20183#bib.bib35)\); Zhao and Cheong \([2025](https://arxiv.org/html/2604.20183#bib.bib16)\)\. However, harnessing LLMs for this purpose remains a challenge, leading to diverse preliminary explorationsWanget al\.\([2025](https://arxiv.org/html/2604.20183#bib.bib17)\); Jianget al\.\([2025a](https://arxiv.org/html/2604.20183#bib.bib22)\); Chenet al\.\([2025b](https://arxiv.org/html/2604.20183#bib.bib11)\)\.
A fundamental bottleneck impeding current methods is cognitive interference from entangled modeling paradigms\. As shown in Figure[1](https://arxiv.org/html/2604.20183#S1.F1), this presents conflicting signals, where logical constraints \(e\.g\., exact multiples\) suggest Constraint Programming \(CP\), resource maximization implies Integer Linear Programming \(ILP\), and sequential stage dependencies signal Dynamic Programming \(DP\)\. While these features are relevant, their simultaneous presence acts as a confounding distractor\.
Figure 1:Illustration of a single production planning problem formalized via distinct algorithmic paradigms\.This complexity exposes deficiencies in current paradigms\. Fine\-tuning approaches are misled by this interference, mechanically applying memorized templates when subtle variations necessitate alternative strategiesWang and Li \([2025](https://arxiv.org/html/2604.20183#bib.bib49)\)\. Agentic frameworks struggle with these trade\-offs during verification; facing ambiguous algorithmic choices, they rely on the static prompts \(e\.g\., “Check if correct”\) that lack the granularity to detect paradigm\-specific pitfallsHuanget al\.\([2024](https://arxiv.org/html/2604.20183#bib.bib10)\), such as distinguishing between linearity gaps in ILP and recurrence validity in DP\(Zhang and Luo,[2025](https://arxiv.org/html/2604.20183#bib.bib50)\)\. This reveals a fundamental tension: the interference problem demands flexibility to navigate paradigm ambiguity, yet also requires targeted knowledge to verify paradigm\-specific correctness\.
To address this, we propose theDual\-Cluster Memory Agent \(DCM\-Agent\), a training\-free framework that balances flexibility and structure by externalizing reasoning patterns into historical archives\. Instead of relying on parameter updates, this framework decouples abstract modeling from precise coding by directly capitalizing on historical solution archives\. Central to this framework is theDual\-Cluster Construction, which resolves structural ambiguity by organizing historical data, ranging from canonical successes to persistent failures, into a bipartite graph that quantifies the relationships between the independentModeling Clusterand theCoding Cluster\. This process distills the raw experience nodes within each cluster into three cluster\-level structured knowledge tiers:Approach,Checklist, andPitfall\. By explicitly mapping the decision space, this design enables the agent to filter out interfering noise and translate isolated solutions into robust structural knowledge\.
Building upon the dual\-cluster memory, we introduceMemory\-Augmented Inference\. Unlike conventional static prompting, this mechanism effectively solves new problems by retrieving relevant clusters that offer corresponding structured knowledge to guide\. The inference is executed through an iterative generate\-verify\-repair\-backtrack pipeline that operates dynamically\. By leveraging the retrievedPitfallsandChecklists, this pipeline systematically steers the generation process, allowing the agent to detect method\-specific errors \(e\.g\., integrality gaps versus recurrence failures\) and autonomously backtrack to alternative reasoning paths when a chosen paradigm proves infeasible\.
We evaluate DCM\-Agent across seven diverse optimization benchmarks, where it achieves an average performance improvement of 11%–21% compared to standalone LLMs\. The results demonstrate that DCM\-Agent maintains consistent state\-of\-the\-art accuracy across various model scales without the computational overhead of training, offering a superior trade\-off between solution precision and efficiency\. Crucially, our analysis reveals a “knowledge inheritance” phenomenon: memory constructed by larger models can be effectively transferred to guide smaller models toward superior performance, verifying the framework’s scalability and the transferability of its training\-free logic\.
## 2Related Work
### 2\.1LLM\-based Optimization Modeling
LLM\-based optimization modeling reduces the expertise required for complex formulation via prompt\-based strategies or fine\-tuning methods\. Prompt\-based frameworks utilize multi\-agent workflowsXiaoet al\.\([2023](https://arxiv.org/html/2604.20183#bib.bib38)\); AhmadiTeshniziet al\.\([2024](https://arxiv.org/html/2604.20183#bib.bib39)\); Zhanget al\.\([2025b](https://arxiv.org/html/2604.20183#bib.bib2)\)or tree\-search algorithmsLiuet al\.\([2025c](https://arxiv.org/html/2604.20183#bib.bib24)\); Astorgaet al\.\([2025](https://arxiv.org/html/2604.20183#bib.bib23)\)to improve reasoning and code generation in general\-purpose LLMs\. Conversely, fine\-tuning methods like FOARLJianget al\.\([2025b](https://arxiv.org/html/2604.20183#bib.bib44)\), ORLMHuanget al\.\([2025a](https://arxiv.org/html/2604.20183#bib.bib20)\), and SIRLChenet al\.\([2025a](https://arxiv.org/html/2604.20183#bib.bib19)\)develop specialized models by training on domain\-specific operations research datasets to internalize modeling patterns\.
### 2\.2Retrieval\-Augmented Reasoning
Despite the progress of reasoning LLMsOpenAI \([2024](https://arxiv.org/html/2604.20183#bib.bib61)\); Guoet al\.\([2025](https://arxiv.org/html/2604.20183#bib.bib37)\), reliance on internal knowledge often leads to hallucinations that prompting alone cannot resolveHuanget al\.\([2025b](https://arxiv.org/html/2604.20183#bib.bib32)\); Weiet al\.\([2022](https://arxiv.org/html/2604.20183#bib.bib31)\)\. Retrieval\-Augmented Reasoning addresses this by integrating external verification into the reasoning processBarryet al\.\([2025](https://arxiv.org/html/2604.20183#bib.bib29)\)\. While frameworks like ReActYaoet al\.\([2022](https://arxiv.org/html/2604.20183#bib.bib27)\)and Retrieval\-Augmented ThoughtsWanget al\.\([2024](https://arxiv.org/html/2604.20183#bib.bib28)\)dynamically revise traces with retrieved data, this paradigm is also effective for optimization\. For example, OptiTreeLiuet al\.\([2025c](https://arxiv.org/html/2604.20183#bib.bib24)\)retrieves analogous subproblems to ground reasoning, ensuring complex modeling steps remain verifiable\.
Figure 2:Overview of the Dual\-Cluster Memory Agent \(DCM\-Agent\)\. This agent operates in two distinct phases: \(1\)Dual\-Cluster Memory Construction:Historical solutions are stratified into three types to distill structured knowledge𝒦\\mathcal\{K\}\. These are organized into decoupled Modeling Clusters and Coding Clusters, bridged by a weighted bipartite graph𝒢\\mathcal\{G\}\. \(2\)Memory\-Augmented Inference:For a new problem, the agent retrieves relevant cluster paths to guide the sequential generation of the mathematical formulationm^\\hat\{m\}and executable codec^\\hat\{c\}\.
## 3Methodology
### 3\.1Overview
We formalize optimization problem solving via LLMs as a composite structured process spanning the problem space𝒳\\mathcal\{X\}, modeling space𝕄\\mathbb\{M\}, and coding spaceℂ\\mathbb\{C\}\. The solutiony^\\hat\{y\}is derived via:
y^=ℰ\(hψ\(c∣m\)⏟Coding∘gϕ\(m∣x\)⏟Modeling\)\\hat\{y\}=\\mathcal\{E\}\(\\underbrace\{h\_\{\\psi\}\(c\\mid m\)\}\_\{\\text\{Coding\}\}\\circ\\underbrace\{g\_\{\\phi\}\(m\\mid x\)\}\_\{\\text\{Modeling\}\}\)\(1\)wheregϕg\_\{\\phi\}generates the modeling logicm^\\hat\{m\},hψh\_\{\\psi\}synthesizes the executable codec^\\hat\{c\}, andℰ\(⋅\)\\mathcal\{E\}\(\\cdot\)serves as the code executor\. The core challenge lies in the intrinsic one\-to\-many nature of thisx→m^→c^x\\to\\hat\{m\}\\to\\hat\{c\}mapping: as a single problem admits diverse valid modeling logics and coding implementation \(Figure[1](https://arxiv.org/html/2604.20183#S1.F1)\), the model must possess robust judgment capabilities to navigate these possibilities\.
We propose theDual\-Cluster Memory Agent \(DCM\-Agent\), designed to manage this complexity by explicitly decoupling modeling \(gϕg\_\{\\phi\}\) from coding \(hψh\_\{\\psi\}\), as shown in Figure[2](https://arxiv.org/html/2604.20183#S2.F2)\. DCM\-Agent operates in two phases: \(1\)Dual\-Cluster Memory Construction, which distills instance\-level experience nodes into generalized structured knowledge, and \(2\)Memory\-Augmented Inference, which applies the knowledge to solve novel problems\.
To support this, DCM\-Agent maintains adynamic memory𝒟\\mathcal\{D\}with a novel hierarchical structure: specific experience nodes are organized into decoupled Modeling Clusters and Coding Clusters\. While individual nodes store trajectory details, each cluster maintains a high\-level generalized knowledge \(𝒦\\mathcal\{K\}\) synthesized from its constituent nodes\. At the same time, DCM\-Agent constructs a specialized bipartite graph𝒢\\mathcal\{G\}to bridge these clusters, modeling the compatibility between abstract modeling logics and concrete coding strategies\.
### 3\.2Dual\-Cluster Memory Construction
This phase transforms raw problem\-solution trajectories into a structured, decoupled memory system\. The process follows a rigorous bottom\-up lifecycle, progressing from the discrete analysis of individual instances to collective knowledge synthesis\.
#### 3\.2\.1Node\-Level Construction
We first curate a dataset of distinct samples, disjoint from the evaluation benchmarks\. To ensure the quality of extracted knowledge, we classify the solutions of these samples into three categories:
- •Type A \(Always Correct\):Samples are consistently solved correctly across multiple attempts\. These represent canonical problem\-solving patterns and serve as standard references\.
- •Type B \(Recovered\):The samples that initially failed but yielded correct solutions upon re\-attempting\. These instances capture the boundary between failure and success, providing critical information on specific failure modes and their corresponding recovery reasoning\.
- •Type C \(Persistent Failure\):The solutions persist in failure despite exhaustive attempts \(e\.g\., exceeding 3 rounds\)\. These encode fundamental mismatches between problems and approaches\.
This stratification facilitates differential knowledge extraction: the successful instances \(Types A and B\) provide positive references and caveats, while failures \(Types B and C\) reveal critical pitfalls\.
To transform these raw samples into actionable memory, we first decompose each problem\-solution pair into distinct modeling logic and coding implementation components\. For each component, we generate embeddings \(𝐞m,𝐞c\\mathbf\{e\}\_\{m\},\\mathbf\{e\}\_\{c\}\) to enable semantic retrieval, and simultaneously extract a tuple of instance\-specific knowledge, denoted asΦn=⟨ϕnapproach,ϕnchecklist,ϕnpitfall⟩\\Phi\_\{n\}=\\langle\\phi\_\{n\}^\{\\text\{approach\}\},\\phi\_\{n\}^\{\\text\{checklist\}\},\\phi\_\{n\}^\{\\text\{pitfall\}\}\\rangle, as summarized in Table[1](https://arxiv.org/html/2604.20183#S3.T1)\. Here,Φn\\Phi\_\{n\}represents local insights specific to nodenn\. Specifically, we synthesize canonical approaches \(ϕnapproach\\phi\_\{n\}^\{\\text\{approach\}\}\) and verification checklists \(ϕnchecklist\\phi\_\{n\}^\{\\text\{checklist\}\}\) from successful nodes \(Types A and B\), while deriving explicit pitfall warnings \(ϕnpitfall\\phi\_\{n\}^\{\\text\{pitfall\}\}\) from failure trajectories \(Types B and C\)\.
Guidance TierSource TypeContent Definitionϕapproach\\phi^\{\\text\{approach\}\}Type A \+ Type BSolution templates\(How to solve\)\(Success\)and logical stepsϕchecklist\\phi^\{\\text\{checklist\}\}Type A \+ Type BValidity criteria and\(What to verify\)\(Success\)boundary checksϕpitfall\\phi^\{\\text\{pitfall\}\}Type B \+ Type CCommon errors and\(What to avoid\)\(Failure\)constraint violations
Table 1:Mapping from sample types to guidance tiers\.
#### 3\.2\.2Cluster\-Level Evolution
Once individual nodes are processed, DCM\-Agent organizes them to form generalized knowledge\. This involves clustering nodes, synthesizing knowledge, and establishing graph connectivity\.
Cluster Assignment\.Each experience node is integrated into the memory via a rigorous assignment process\. First, embedding\-based retrieval identifies top\-kkcandidate clusters by comparing the node’s embedding \(𝐞m\\mathbf\{e\}\_\{m\}or𝐞c\\mathbf\{e\}\_\{c\}\) with cluster centroids𝝁\\boldsymbol\{\\mu\}\. Second, an LLM\-based verifier checks semantic consistency to either merge the node into a matched candidate or initialize a new cluster\.
Knowledge Update\.To resolve the ambiguity between the specific examples and general patterns, we employ an incremental update mechanism\. Each cluster maintains a generalized knowledge𝒦=⟨𝒦approach,𝒦checklist,𝒦pitfall⟩\\mathcal\{K\}=\\langle\\mathcal\{K\}^\{\\text\{approach\}\},\\mathcal\{K\}^\{\\text\{checklist\}\},\\mathcal\{K\}^\{\\text\{pitfall\}\}\\rangle, which serves as the consolidated schema for that cluster \(distinct from the rawΦn\\Phi\_\{n\}of individual nodes\)\. When a cluster accumulates a threshold of new nodes \(such asN=5N=5\), we trigger a knowledge update step:
𝒦\(t\+1\)=LLMsynth\(𝒦\(t\)∪⋃j=1NΦnj\)\\mathcal\{K\}^\{\(t\+1\)\}=\\text\{LLM\}\_\{\\text\{synth\}\}\\left\(\\mathcal\{K\}^\{\(t\)\}\\ \\cup\\ \\bigcup\_\{j=1\}^\{N\}\\Phi\_\{n\_\{j\}\}\\right\)\(2\)Here,LLMsynth\\text\{LLM\}\_\{\\text\{synth\}\}is used to abstract generalized patterns from the new batch of instance knowledgeΦnj\\Phi\_\{n\_\{j\}\}and merge them into the generalized knowledge𝒦\(t\)\\mathcal\{K\}^\{\(t\)\}\. This ensures that𝒦\\mathcal\{K\}evolves to capture robust, non\-redundant insights while retaining specific pitfall warnings, and is not overly influenced by extreme samples, as shown in Figure[3](https://arxiv.org/html/2604.20183#S3.F3)\.
Figure 3:Two examples in our Dual\-Cluster Memory\.Bipartite Graph Construction\.At the same time, we introduce a bipartite graph𝒢\\mathcal\{G\}to model the associations between the these decoupled clusters\. Since each experience nodennnaturally maps to a pair of clusters\(CiM,CjC\)\(C^\{M\}\_\{i\},C^\{C\}\_\{j\}\), these linkages aggregate into a global structure\. We formalize this as a bipartite graph𝒢=\(VM,VC,E\)\\mathcal\{G\}=\(V\_\{M\},V\_\{C\},E\), where the edge weightwijw\_\{ij\}quantifies the co\-occurrence frequency of modeling logicCiMC^\{M\}\_\{i\}and coding strategyCjCC^\{C\}\_\{j\}\. The strong edges represent proven pathways, providing critical priors for subsequent usage\.
### 3\.3Memory\-Augmented Inference
#### 3\.3\.1Dual\-Retrieval
For a new problemxnewx\_\{\\text\{new\}\}, DCM\-Agent leverages the memory to efficiently navigate the solution space by retrieving relevant historical experiences\. We first encode the problem into the modeling logic embedding𝐞new\\mathbf\{e\}\_\{\\text\{new\}\}and employ two complementary retrieval mechanisms to balance the problem relevance with general algorithmic applicability:
Instance\-Level Retrievalcaptures the granular problem similarity by retrieving specific nodesℋ\\mathcal\{H\}closest to𝐞new\\mathbf\{e\}\_\{\\text\{new\}\}, thereby identifying relevant experience nodes that share detailed semantic features:
ℋ=argmaxK\{sim\(𝐞new,𝐞i\)∣xi∈𝒟\}\.\\mathcal\{H\}=\\arg\\max\_\{K\}\\\{\\text\{sim\}\(\\mathbf\{e\}\_\{\\text\{new\}\},\\mathbf\{e\}\_\{i\}\)\\mid x\_\{i\}\\in\\mathcal\{D\}\\\}\.\(3\)
Cluster\-Level Retrievaltargets abstract patterns by comparing𝐞new\\mathbf\{e\}\_\{\\text\{new\}\}directly with cluster centroids𝝁kM\\boldsymbol\{\\mu\}\_\{k\}^\{M\}, which ensures capturing the modeling logic beyond surface\-level textual matches:
𝒮cluster=argmaxK\{sim\(𝐞new,𝝁kM\)\}\\mathcal\{S\}\_\{\\text\{cluster\}\}=\\arg\\max\_\{K\}\\\{\\text\{sim\}\(\\mathbf\{e\}\_\{\\text\{new\}\},\\boldsymbol\{\\mu\}\_\{k\}^\{M\}\)\\\}\(4\)
These two sources are united to yield a robust final set of candidate modeling clusters, integrating both specific exemplars and general categories:
ℛ=\{CM\(xi\)∣xi∈ℋ\}∪𝒮cluster\.\\mathcal\{R\}=\\\{C^\{M\}\(x\_\{i\}\)\\mid x\_\{i\}\\in\\mathcal\{H\}\\\}\\cup\\mathcal\{S\}\_\{\\text\{cluster\}\}\.\(5\)
To effectively bridge modeling logic with coding implementation, we query the graph𝒢\\mathcal\{G\}to exploit learned associations\. For each identifiedCiM∈ℛC^\{M\}\_\{i\}\\in\\mathcal\{R\}, we retrieve the top\-KKcoding neighbors𝒩k\\mathcal\{N\}\_\{k\}with the highest edge weights to form a diverse trajectory pool𝒫\\mathcal\{P\}\. Finally, an LLM selector serves as a verifier to rank these combinations based on their logical alignment withxnewx\_\{\\text\{new\}\}, returning a prioritized queue𝒬\\mathcal\{Q\}of the most promisingMMsolution paths:
𝒬=Top\-M\(p\)∈𝒫\(LLMselect\(𝒫,xnew\)\)\\mathcal\{Q\}=\\mathop\{\\text\{Top\-\}M\}\_\{\(p\)\\in\\mathcal\{P\}\}\\left\(\\text\{LLM\}\_\{\\text\{select\}\}\(\\mathcal\{P\},x\_\{\\text\{new\}\}\)\\right\)\(6\)
#### 3\.3\.2Solving via Generalized Knowledge
DCM\-Agent processes the prioritized queue𝒬\\mathcal\{Q\}using a Generate\-Verify\-Repair\-Backtrack pipeline\. Crucially, all the steps are conditioned on the𝒦\\mathcal\{K\}of the selected clusters, rather than node knowledgeϕ\\phi, as shown in Figure[2](https://arxiv.org/html/2604.20183#S2.F2)\. For a pathpt=\(CiM,CjC\)p\_\{t\}=\(C^\{M\}\_\{i\},C^\{C\}\_\{j\}\):
1\. Generation & Verification\.The LLM generates the modeling logicm^raw\\hat\{m\}\_\{\\text\{raw\}\}using the cluster’s canonical approach𝒦iapproach\\mathcal\{K\}\_\{i\}^\{\\text\{approach\}\}\. Immediately, it verifiesm^raw\\hat\{m\}\_\{\\text\{raw\}\}against the cluster’s checklist𝒦ichecklist\\mathcal\{K\}\_\{i\}^\{\\text\{checklist\}\}:
m^raw\\displaystyle\\hat\{m\}\_\{\\text\{raw\}\}=LLMgen\(xnew∣𝒦iapproach\)\\displaystyle=\\text\{LLM\}\_\{\\text\{gen\}\}\(x\_\{\\text\{new\}\}\\mid\\mathcal\{K\}\_\{i\}^\{\\text\{approach\}\}\)\(7\)m^\\displaystyle\\hat\{m\}=LLMverify\(m^raw∣𝒦ichecklist\)\\displaystyle=\\text\{LLM\}\_\{\\text\{verify\}\}\(\\hat\{m\}\_\{\\text\{raw\}\}\\mid\\mathcal\{K\}\_\{i\}^\{\\text\{checklist\}\}\)\(8\)Once them^\\hat\{m\}is established, DCM\-Agent transitions to code generation\. The executable codec^\\hat\{c\}is generated using the coding cluster’s templates𝒦japproach\\mathcal\{K\}\_\{j\}^\{\\text\{approach\}\}and rigorously checked against guidelines𝒦jchecklist\\mathcal\{K\}\_\{j\}^\{\\text\{checklist\}\}, ensuring the robustness of the code structure
2\. Repair & Backtracking\.If the executionℰ\(c^\)\\mathcal\{E\}\(\\hat\{c\}\)fails, resulting in the runtime erroree, DCM\-Agent initiates a knowledge\-guided repair mechanism\. Instead of blind debugging, it systematically analyzeseein the context of the cluster’s specific pitfall warnings𝒦jpitfall\\mathcal\{K\}\_\{j\}^\{\\text\{pitfall\}\}, which contain common error patterns associated with this algorithm type:
c^fixed=LLMfix\(c^,e∣𝒦jpitfall,𝒦jchecklist\)\\hat\{c\}\_\{\\text\{fixed\}\}=\\text\{LLM\}\_\{\\text\{fix\}\}\(\\hat\{c\},e\\mid\\mathcal\{K\}\_\{j\}^\{\\text\{pitfall\}\},\\mathcal\{K\}\_\{j\}^\{\\text\{checklist\}\}\)\(9\)If repair attempts fail to yield a solution within a limit, the agent triggers a backtracking protocol\. It discards the path and activates the nextpt\+1p\_\{t\+1\}from𝒬\\mathcal\{Q\}, preventing the system from getting stuck in local optima and ensuring robust problem\-solving\.
MethodDatasetsAvg\.NL4OptComplexLPNLP4LPOptiBenchOptMATHIndORComplexORSize23021124260516610018\-Qwen3\-8BBaseline41\.7416\.1135\.1230\.5810\.2419\.0022\.2227\.99OptiMUS53\.4823\.2243\.3944\.1315\.0623\.0033\.3338\.04AF\-MCTS47\.3919\.4339\.2636\.5312\.6521\.0033\.3332\.70OptiTree55\.2225\.5946\.2845\.1216\.2625\.0038\.8939\.76DCM\-Agent64\.3532\.2362\.4055\.3721\.6930\.0050\.0049\.43Qwen3\-30BBaseline55\.2228\.4452\.0743\.6416\.8725\.0038\.8940\.52OptiMUS63\.4833\.6463\.6456\.2024\.7029\.0050\.0050\.25AF\-MCTS65\.6534\.6068\.1857\.6825\.9031\.0050\.0052\.22OptiTree70\.8836\.4970\.2559\.5027\.7130\.0055\.5654\.45DCM\-Agent77\.3941\.2376\.0364\.3032\.5334\.0061\.1159\.61Qwen3\-235BBaseline78\.1340\.7674\.3860\.8331\.3332\.0055\.5657\.74OptiMUS83\.4844\.5577\.2764\.3037\.9534\.0066\.6761\.77AF\-MCTS87\.3946\.9278\.9368\.2640\.3637\.0061\.1164\.82OptiTree89\.5648\.8280\.1670\.7441\.5736\.0066\.6766\.66DCM\-Agent93\.4854\.7684\.7175\.2146\.3940\.0072\.2271\.28Deepseek\-V3\.2Baseline82\.6138\.3971\.7358\.6836\.7534\.0061\.1157\.61OptiMUS86\.0943\.6075\.2162\.1539\.7636\.0066\.6761\.20AF\-MCTS89\.1346\.9278\.1067\.2743\.9839\.0066\.6765\.14OptiTree90\.8747\.8779\.3470\.0847\.5938\.0072\.2267\.18DCM\-Agent95\.2253\.5583\.0674\.0552\.7341\.0077\.7771\.47GPT5\.1Baseline87\.3955\.4584\.3068\.2643\.3844\.0066\.6767\.62OptiMUS90\.4358\.7786\.7871\.0745\.7846\.0072\.2270\.42AF\-MCTS94\.4760\.6688\.8474\.7151\.2051\.0077\.7773\.93OptiTree95\.6562\.5690\.0876\.8653\.6149\.0077\.7775\.51DCM\-Agent97\.8365\.4093\.3980\.1655\.4253\.0083\.3378\.50
Table 2:Solving accuracy \(%\) comparison across different datasets\. The best results are highlighted inbold\.MethodNLP4LPOptiBenchOptMATHBaseline8\.3s13\.7s21\.7sOptiMUS26\.5s46\.6s86\.3sAF\-MCTS85\.3s110\.8s205\.7sOptiTree17\.7s33\.5s61\.6sDCM\-Agent22\.1s41\.3s73\.4s
Table 3:Time cost statistics \(in seconds\) of the Qwen3\-235B model across selected benchmarks\.
## 4Experiment
### 4\.1Experiment Setups
Datasets\.To comprehensively evaluate our DCM\-Agent framework across a spectrum of complexities, we utilize a diverse suite of seven optimization benchmarks\. We employ NL4OptRamamonjisonet al\.\([2023](https://arxiv.org/html/2604.20183#bib.bib8)\)and NLP4LPAhmadiTeshniziet al\.\([2024](https://arxiv.org/html/2604.20183#bib.bib39)\)as standard baselines for linear and mixed\-integer programming\. To ensure rigorous testing on more demanding tasks, we incorporate OptiBenchYanget al\.\([2025b](https://arxiv.org/html/2604.20183#bib.bib42)\), OptMATHLuet al\.\([2025](https://arxiv.org/html/2604.20183#bib.bib41)\), and the ComplexLP subset of MAMOHuanget al\.\([2025c](https://arxiv.org/html/2604.20183#bib.bib9)\)\. Finally, we assess real\-world applicability using IndustryORHuanget al\.\([2025a](https://arxiv.org/html/2604.20183#bib.bib20)\)and ComplexORXiaoet al\.\([2023](https://arxiv.org/html/2604.20183#bib.bib38)\)datasets\. The memory is constructed using500500problems that do not intersect with the current benchmarks\.
Baselines\.We systematically compare DCM\-Agent with diverse methods on different LLMs of varying sizes, ranging from generic standard LLMs to advanced specialized optimization frameworks\. We first establish a fundamental baseline using LLMs of varying sizes, including Qwen3 series \(8B, 30B, 235B\)Yanget al\.\([2025a](https://arxiv.org/html/2604.20183#bib.bib15)\), DeepSeek\-V3\.2Liuet al\.\([2025a](https://arxiv.org/html/2604.20183#bib.bib12)\), and GPT\-5\.1OpenAI \([2025](https://arxiv.org/html/2604.20183#bib.bib13)\)\. Subsequently, we compare against representative specialized methods: \(1\) OptiMUSAhmadiTeshniziet al\.\([2024](https://arxiv.org/html/2604.20183#bib.bib39)\), a multi\-agent workflow that enhances reliability through structured input processing; \(2\) AF\-MCTSAstorgaet al\.\([2025](https://arxiv.org/html/2604.20183#bib.bib23)\), which employs Monte Carlo Tree Search to sequentially identify variables, constraints, and objectives; and \(3\) OptiTreeLiuet al\.\([2025c](https://arxiv.org/html/2604.20183#bib.bib24)\), which tackles high\-complexity tasks by adaptively decomposing problems into manageable sub\-problems\.
Evaluation\.Consistent with prior research, we adopt strictend\-to\-end solving accuracyas our primary evaluation metric\. Our protocol evaluates the complete resolution pipeline: given a problem, the model generates the codeccto produce execution outputo=Python\(c\)o=\\text\{Python\}\(c\), where the allowed libraries areGurobi,PuLP,OR\-Tools,SciPy, andNetworkX\. A problem is considered solved if the extracted numerical answers for both the requirement and the objective function match the ground truth\.
Ratio0%10%40%70%100%NLP4LP74\.3878\.5181\.4082\.6484\.71OptiBench60\.8366\.4571\.5774\.0575\.21OptMATH31\.3336\.7540\.3644\.5846\.39
Table 4:Performance comparison under different memory budgets \(10%, 40%, 70%, and 100%\) with Qwen3\-235B across selected datasets\. The results demonstrate the impact of memory constraints on solving accuracy\.MemoryNLP4LPOptiBenchOptMATHQwen3\-8BBaseline35\.1230\.5810\.24Qwen3\-8B62\.4055\.3721\.69Qwen3\-30B67\.3659\.3423\.49Qwen3\-235B69\.4260\.1725\.30DeepSeek\-V3\.266\.5358\.6824\.10GPT\-5\.165\.2857\.5222\.89Qwen3\-235BBaseline74\.3860\.8331\.33Qwen3\-8B78\.5169\.5940\.96Qwen3\-30B82\.6473\.2243\.98Qwen3\-235B84\.7175\.2146\.39DeepSeek\-V3\.285\.5476\.0347\.59GPT\-5\.186\.3677\.3648\.19
Table 5:Cross\-model knowledge transfer: Impact of memory construction model on performance \(%\)\.Figure 4:Statistical distribution of the number of two clusters obtained from LLMs of different sizes\.
### 4\.2Main Results
By carefully analyzing the accuracy results in Table[2](https://arxiv.org/html/2604.20183#S3.T2)alongside the detailed time cost statistics in Table[3](https://arxiv.org/html/2604.20183#S3.T3), we derive the following key insights:
Superiority of Our DCM\-Agent across Model Scales\.DCM\-Agent consistently achieves the highest accuracy across all evaluated model sizes, ranging from the 8B parameter scale to GPT5\.1\. This uniform success demonstrates the framework’s robustness and its capability to serve as a universal performance enhancer for optimization tasks regardless of the underlying backbone architecture\.
Lower Capability Requirements for Memory Construction\.The performance gains of DCM\-Agent are most pronounced on smaller LLMs because its structured memory construction process is relatively less demanding on the model’s inherent reasoning power\. By offloading constraint management to an external memory module, DCM\-Agent allows smaller models to overcome their parameter limitations and thereby achieve competitive results typically reserved for much larger models\.
Balanced Efficiency and Computational Overhead\.As shown in Table[3](https://arxiv.org/html/2604.20183#S3.T3), DCM\-Agent significantly reduces the cost time compared to heavy search\-based methods like AF\-MCTS while maintaining superior accuracy\. This suggests that DCM\-Agent’s memory\-driven reasoning is more computationally purposeful than exhaustive tree exploration, providing an optimal overall trade\-off between solving performance and time efficiency\.
Figure 5:Ablation study on NLP4LP, OptiBench, and OptMATH datasets using Qwen3\-235B\.SettingNLP4LPOptiBenchOptMATHRetrieval Top\-KKin construction and inferenceK=1K=179\.3470\.9140\.36K=3K=384\.7175\.2146\.39K=5K=583\.0674\.0543\.98Memory Update Threshold \(NN\) in constructionN=1N=184\.3074\.2144\.58N=5N=584\.7175\.2146\.39N=10N=1082\.6472\.7342\.77Planning Candidates \(MM\) in inferenceM=1M=182\.2372\.5642\.17M=3M=384\.7175\.2146\.39M=5M=585\.5475\.8746\.99
Table 6:Parameter sensitivity analysis on key hyperparameters \(KK,MM,NN\) across three benchmarks\.Figure 6:Comparison between the baseline and DCM\-Agent on a discrete optimization task\. DCM\-Agent correctly identifies integrality constraints, whereas the baseline produces a physically infeasible fractional solution\.
### 4\.3Dual\-Cluster Memory Analysis
Cluster Statistics across LLMs\.We investigate the number of Modeling Clusters \(MC\) and Constraint Clusters \(CC\) generated by different LLMs during the memory construction phase\. As illustrated in Figure[4](https://arxiv.org/html/2604.20183#S4.F4), stronger models \(e\.g\., GPT\-5\.1\) tend to generate a significantly higher number of Modeling Clusters compared to smaller models \(e\.g\., Qwen3\-8B\)\. This suggests that advanced LLMs possess a more granular understanding of abstract problem structures, allowing them to disentangle subtle differences in mathematical formulations that smaller models might otherwise conflate\.
Impact of Memory Nodes\.We assess the system’s robustness by varying the available memory budget from 0% to 100% using Qwen3\-235B as the backbone\. As shown in Table[4](https://arxiv.org/html/2604.20183#S4.T4), performance improves progressively as the number of memory nodes increases\. This confirms that the breadth of historical experiences directly correlates with the system’s ability to generalize to novel problems\.
Cross\-Model Memory Transferability\.A pivotal advantage of DCM\-Agent lies in the architectural decoupling of memory construction from inference, enabling flexible cross\-model synergy\. As shown in Table[5](https://arxiv.org/html/2604.20183#S4.T5), we observe a compelling “knowledge inheritance” effect: structural priors generated by superior models significantly elevate the performance of smaller counterparts\. However, performance eventually declines as memory\-generating LLMs become exceptionally powerful; we hypothesize that the resulting high\-density cluster information exceeds the processing capacity of smaller models\. This confirms that while weaker models can inherit superior structural reasoning from stronger ones, the complexity remains within the target model’s processing threshold\.
### 4\.4Ablation Studies
To assess the impact of DCM\-Agent’s components, we conduct ablation studies using Qwen3\-235B by selectively removing the Modeling and Coding Clusters\. As illustrated in Figure[5](https://arxiv.org/html/2604.20183#S4.F5), the complete framework consistently yields the highest accuracy, validating the synergy of the dual\-cluster mechanism\. We observe that the removal of Modeling Clusters results in a sharper performance drop compared to removing Coding Clusters \(e\.g\., the blue bars are consistently lower than the orange ones\)\. This result confirms that while executable code guidance is beneficial, decoupling and retrieving precise mathematical logic is the decisive factor for solving complex optimization problems\.
### 4\.5Parameter Analysis
We systematically evaluate DCM\-Agent’s sensitivity to retrieval size \(KK\), update threshold \(NN\), and planning candidates \(MM\), summarized in Table[6](https://arxiv.org/html/2604.20183#S4.T6)\. BothKKandNNexhibit a distinct bell\-shaped trend, peaking atK=3K=3andN=5N=5\. Lower values suffer from insufficient context or unstable generalization \(overfitting to isolated samples\), while higher values are hampered by excessive input context length or delayed knowledge consolidation\. In contrast, increasingMMyields consistent monotonic gains by broadening the search space\. However, given the marginal improvement atM=5M=5relative to the incurred computational overhead, we adoptM=3M=3to ensure the optimal cost\-effectiveness\.
### 4\.6Case Study
As shown in Figure[6](https://arxiv.org/html/2604.20183#S4.F6), DCM\-Agent identifies implicit integrality constraints \(e\.g\., discrete bag quantities\) and selects an appropriate Mixed\-Integer Programming \(MIP\) solver\. Conversely, the baseline overlooks these constraints due to semantic misalignment, treating the problem as a continuous linear task\. While the baseline reports a nominally lower cost \($60\.80 vs\. $70\.00\), its reliance on fractional quantities results in an infeasible solution\. This “deceptive optimality” underscores that numerical performance is secondary to the foundational correctness of the modeling logic, such as the proper distinction betweenIntVarandNumVar\.
## 5Conclusion
We presented the novel Dual\-Cluster Memory Agent \(DCM\-Agent\), which addresses structural ambiguity in optimization by decoupling abstract modeling logic from concrete code implementation\. By leveraging Dual\-Cluster Organization and Evolutionary Experience Stratification, DCM\-Agent provides precise algorithm\-specific guidance through stratified historical insights\. Empirical results across seven benchmarks confirm that DCM\-Agent achieves consistent state\-of\-the\-art accuracy across various model scales while not introducing too much computational overhead\. This framework effectively bridges the critical gap between mathematical formulation and executable solver code, significantly improving overall solution robustness\. DCM\-Agent thus offers a scalable and efficient approach to navigating the highly complex decision space of automated optimization problem solving\.
## 6Acknowledgements
This work was supported by Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China \(JYB2025XDXM116\), National Natural Science Foundation of China \(No\. 62137002, 62293550, 62293553, 62293554, 62437002, 62477036, 62477037, 62192781\), the Shaanxi Provincial Social Science Foundation Project \(No\. 2024P041\), the Youth Innovation Team of Shaanxi Universities "Multi\-modal Data Mining and Fusion", and Xi’an Jiaotong University City College Research Project \(No\. 2024Y01\)\.
## 7Limitation
Despite the superior performance of the Dual\-Cluster Memory Agent \(DCM\-Agent\) across various benchmarks and model scales, we identify one primary limitation concerning the framework’s initialization process\. A notable bottleneck lies in the initialization latency during the memory construction phase\. Specifically, distilling structured knowledge requires collecting and classifying historical trajectories into distinct categories and progressively building the bipartite graph\. This process incurs a one\-time computational overhead that is non\-negligible compared to zero\-shot prompting\. However, it is crucial to emphasize that this represents a "sunk cost": once the Dual\-Cluster Memory is fully constructed and stabilized, the framework operates in a plug\-and\-play manner\. The subsequent Memory\-Augmented Inference phase is highly efficient, relying on fast embedding retrieval and path planning rather than exhaustive re\-generation\. This design effectively amortizes the initial construction cost over long\-term usage\. In future work, we aim to explore online learning mechanisms that enable the memory to evolve dynamically through user interactions, eliminating the need for full reconstruction cycles\.
## 8Ethical Statement
In developing DCM\-Agent, we have rigorously considered the ethical implications of our research, particularly concerning data integrity, privacy, and computational sustainability\. Our experiments utilize established, publicly available optimization benchmarks \(including NL4Opt, OptiBench, and NLP4LP\) and our own collection of 500 questions, which consist solely of mathematical optimization problems and their solutions\. We have verified that these datasets do not contain personally identifying information, sensitive data, or offensive content\. The problems are purely technical in nature, involving mathematical formulations and programming tasks without any reference to individual identities or potentially harmful content\. We strictly adhere to the licensing agreements of these datasets and the large language models employed \(e\.g\., Qwen, DeepSeek, GPT\)\. By demonstrating that smaller models \(e\.g\., 8B parameters\) can achieve state\-of\-the\-art performance when augmented with our memory system—often surpassing unaugmented larger models—DCM\-Agent offers a pathway to reduce the substantial energy consumption typically associated with running massive foundation models for complex reasoning tasks\. This contribution aligns with the broader goal of developing more sustainable and accessible AI systems\.
## References
- A\. AhmadiTeshnizi, W\. Gao, and M\. Udell \(2024\)OptiMUS: scalable optimization modeling with \(mi\) lp solvers and large language models\.InForty\-first International Conference on Machine Learning,Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p1.1),[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p2.1)\.
- N\. Astorga, T\. Liu, Y\. Xiao, and M\. van der Schaar \(2025\)Autoformulation of mathematical optimization models using llms\.InForty\-second International Conference on Machine Learning,Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p2.1)\.
- M\. Barry, G\. Caillaut, P\. Halftermeyer, R\. Qader, M\. Mouayad, F\. Le Deit, D\. Cariolaro, and J\. Gesnouin \(2025\)Graphrag: leveraging graph\-based efficiency to minimize hallucinations in llm\-driven rag for finance data\.InProceedings of the Workshop on Generative AI and Knowledge Graphs \(GenAIK\),pp\. 54–65\.Cited by:[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1)\.
- S\. Belil, S\. Kemmoé\-Tchomté, and N\. Tchernev \(2018\)MILP\-based approach to mid\-term production planning of batch manufacturing environment producing bulk products\.IFAC\-PapersOnLine51\(11\),pp\. 1689–1694\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- Y\. Chen, J\. Xia, S\. Shao, D\. Ge, and Y\. Ye \(2025a\)Solver\-informed rl: grounding large language models for authentic optimization modeling\.The Thirty\-ninth Annual Conference on Neural Information Processing Systems\.Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1)\.
- Y\. Chen, R\. Chen, F\. Luo, and Z\. Wang \(2025b\)Improving generalization of neural combinatorial optimization for vehicle routing problems via test\-time projection learning\.The Thirty\-ninth Annual Conference on Neural Information Processing Systems\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- D\. Guo, D\. Yang, H\. Zhang, J\. Song, R\. Zhang, R\. Xu, Q\. Zhu, S\. Ma, P\. Wang, X\. Bi,et al\.\(2025\)Deepseek\-r1: incentivizing reasoning capability in llms via reinforcement learning\.arXiv preprint arXiv:2501\.12948\.Cited by:[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1)\.
- C\. Huang, Z\. Tang, S\. Hu, R\. Jiang, X\. Zheng, D\. Ge, B\. Wang, and Z\. Wang \(2025a\)Orlm: a customizable framework in training large models for automated optimization modeling\.Operations Research\.Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p1.1)\.
- J\. Huang, X\. Chen, S\. Mishra, H\. S\. Zheng, A\. W\. Yu, X\. Song, and D\. Zhou \(2024\)Large language models cannot self\-correct reasoning yet\.InThe Twelfth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p3.1)\.
- L\. Huang, W\. Yu, W\. Ma, W\. Zhong, Z\. Feng, H\. Wang, Q\. Chen, W\. Peng, X\. Feng, B\. Qin,et al\.\(2025b\)A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions\.ACM Transactions on Information Systems43\(2\),pp\. 1–55\.Cited by:[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1)\.
- X\. Huang, Q\. Shen, Y\. Hu, A\. Gao, and B\. Wang \(2025c\)LLMs for mathematical modeling: towards bridging the gap between natural and mathematical languages\.InFindings of the Association for Computational Linguistics: NAACL 2025,L\. Chiruzzo, A\. Ritter, and L\. Wang \(Eds\.\),Albuquerque, New Mexico,pp\. 2678–2710\.External Links:[Link](https://aclanthology.org/2025.findings-naacl.146/),[Document](https://dx.doi.org/10.18653/v1/2025.findings-naacl.146),ISBN 979\-8\-89176\-195\-7Cited by:[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p1.1)\.
- C\. Jiang, X\. Shu, H\. Qian, X\. Lu, J\. ZHOU, A\. Zhou, and Y\. Yu \(2025a\)LLMOPT: learning to define and solve general optimization problems from scratch\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- X\. Jiang, Y\. Wu, M\. Li, Z\. Cao, and Y\. Zhang \(2025b\)Large language models as end\-to\-end combinatorial optimization solvers\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1)\.
- A\. Liu, A\. Mei, B\. Lin, B\. Xue, B\. Wang, B\. Xu, B\. Wu, B\. Zhang, C\. Lin, C\. Dong,et al\.\(2025a\)Deepseek\-v3\. 2: pushing the frontier of open large language models\.arXiv preprint arXiv:2512\.02556\.Cited by:[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p2.1)\.
- C\. Liu, H\. Ma, H\. Zhang, X\. Shi, and F\. Shi \(2023\)A milp\-based battery degradation model for economic scheduling of power system\.IEEE Transactions on Sustainable Energy14\(2\),pp\. 1000–1009\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- F\. Liu, Z\. Yang, C\. Liu, T\. Song, X\. Gao, and H\. Liu \(2025b\)Mm\-agent: llm as agents for real\-world mathematical modeling problem\.The Thirty\-ninth Annual Conference on Neural Information Processing Systems\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- H\. Liu, J\. Wang, Y\. Cai, X\. Han, Y\. Kuang, and J\. HAO \(2025c\)OptiTree: hierarchical thoughts generation with tree search for llm optimization modeling\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1),[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p2.1)\.
- H\. Lu, Z\. Xie, Y\. Wu, C\. Ren, Y\. Chen, and Z\. Wen \(2025\)OptMATH: a scalable bidirectional data synthesis framework for optimization modeling\.InForty\-second International Conference on Machine Learning,Cited by:[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p1.1)\.
- M\. Meerschaert \(2013\)Mathematical modeling\.Academic press\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- OpenAI \(2024\)External Links:[Link](https://openai.com/index/learning-to-reason-with-llms)Cited by:[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1)\.
- OpenAI \(2025\)External Links:[Link](https://openai.com/index/gpt-5-1/)Cited by:[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p2.1)\.
- R\. Ramamonjison, T\. Yu, R\. Li, H\. Li, G\. Carenini, B\. Ghaddar, S\. He, M\. Mostajabdaveh, A\. Banitalebi\-Dehkordi, Z\. Zhou,et al\.\(2023\)Nl4opt competition: formulating optimization problems based on their natural language descriptions\.InNeurIPS 2022 competition track,pp\. 189–203\.Cited by:[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p1.1)\.
- A\. Sinha, S\. Arora, and D\. Pujara \(2025\)AutoOpt: a dataset and a unified framework for automating optimization problem solving\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track,Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- Y\. Wang and K\. Li \(2025\)Large language models and operations research: a structured survey\.arXiv e\-prints,pp\. arXiv–2509\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p3.1)\.
- Z\. Wang, B\. Chen, Y\. Huang, Q\. Cao, M\. He, J\. Fan, and X\. Liang \(2025\)ORMind: a cognitive\-inspired end\-to\-end reasoning framework for operations research\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 6: Industry Track\),G\. Rehm and Y\. Li \(Eds\.\),Vienna, Austria,pp\. 104–131\.External Links:[Link](https://aclanthology.org/2025.acl-industry.10/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-industry.10),ISBN 979\-8\-89176\-288\-6Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- Z\. Wang, A\. Liu, H\. Lin, J\. Li, X\. Ma, and Y\. Liang \(2024\)Rat: retrieval augmented thoughts elicit context\-aware reasoning and verification in long\-horizon generation\.InNeurIPS 2024 Workshop on Open\-World Agents,Cited by:[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1)\.
- J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, F\. Xia, E\. Chi, Q\. V\. Le, D\. Zhou,et al\.\(2022\)Chain\-of\-thought prompting elicits reasoning in large language models\.Advances in neural information processing systems35,pp\. 24824–24837\.Cited by:[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1)\.
- Z\. Xiao, D\. Zhang, Y\. Wu, L\. Xu, Y\. J\. Wang, X\. Han, X\. Fu, T\. Zhong, J\. Zeng, M\. Song,et al\.\(2023\)Chain\-of\-experts: when llms meet complex operations research problems\.InThe twelfth international conference on learning representations,Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p1.1)\.
- A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv,et al\.\(2025a\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p2.1)\.
- Z\. Yang, Y\. Wang, Y\. Huang, Z\. Guo, W\. Shi, X\. Han, L\. Feng, L\. Song, X\. Liang, and J\. Tang \(2025b\)OptiBench meets resocratic: measure and improve llms for optimization modeling\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§4\.1](https://arxiv.org/html/2604.20183#S4.SS1.p1.1)\.
- S\. Yao, J\. Zhao, D\. Yu, N\. Du, I\. Shafran, K\. R\. Narasimhan, and Y\. Cao \(2022\)React: synergizing reasoning and acting in language models\.InThe eleventh international conference on learning representations,Cited by:[§2\.2](https://arxiv.org/html/2604.20183#S2.SS2.p1.1)\.
- B\. Zhang and P\. Luo \(2025\)Or\-llm\-agent: automating modeling and solving of operations research optimization problem with reasoning large language model\.arXiv preprint arXiv:2503\.10009\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p3.1)\.
- L\. Zhang, X\. Chang, J\. Liu, M\. Luo, Z\. Li, L\. Yao, and A\. Hauptmann \(2022\)TN\-zstad: transferable network for zero\-shot temporal activity detection\.IEEE Transactions on Pattern Analysis and Machine Intelligence45\(3\),pp\. 3848–3861\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- X\. Zhang, Y\. Dong, Y\. Wu, J\. Huang, C\. Jia, B\. Fernando, M\. Z\. Shou, L\. Zhang, and J\. Liu \(2025a\)PhysReason: a comprehensive benchmark towards physics\-based reasoning\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria,pp\. 16593–16615\.External Links:[Link](https://aclanthology.org/2025.acl-long.811/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.811)Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- X\. Zhang, Y\. Dong, L\. Zhang, C\. Jia, Z\. Dang, B\. Fernando, J\. Liu, and M\. Z\. Shou \(2025b\)CoFFT: chain of foresight\-focus thought for visual language models\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§2\.1](https://arxiv.org/html/2604.20183#S2.SS1.p1.1)\.
- X\. Zhang, L\. Zhang, Y\. Wu, M\. Huang, and J\. Liu \(2025c\)Cognitive predictive coding network: rethinking the generalization in raven’s progressive matrices\.InProceedings of the 33rd ACM International Conference on Multimedia,pp\. 4097–4106\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- X\. Zhang, L\. Zhang, Y\. Wu, M\. Huang, W\. Wu, B\. Li, S\. Wang, B\. Fernando, and J\. Liu \(2025d\)Diagram\-driven course questions generation\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,pp\. 5995–6010\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- X\. Zhang, L\. Zhang, Y\. Wu, S\. Wang, W\. Wu, M\. Huang, Q\. Wang, and J\. Liu \(2025e\)Memory\-enriched thought\-by\-thought framework for complex diagram question answering\.Computer Vision and Image Understanding,pp\. 104608\.Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
- J\. Zhao and K\. H\. Cheong \(2025\)Structure\-aware cooperative ensemble evolutionary optimization on combinatorial problems with multimodal large language models\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2604.20183#S1.p1.1)\.
## Appendix ADetails of Ai Assistants In Research Or Writing
We used Claude\-4\.5\-Sonnet and Gemini\-3\.0\-Pro to help us write code and polish the paper\.
Algorithm 1Dual\-Cluster Memory Construction1:Historical problem\-solution pairs
𝒟raw\\mathcal\{D\}\_\{raw\}
2:Modeling clusters
𝒞M\\mathcal\{C\}^\{M\}, Coding clusters
𝒞C\\mathcal\{C\}^\{C\}, Bipartite graph
𝒢\\mathcal\{G\}
3:// Phase 1: Node\-Level Construction
4:foreach problem\-solution pair
\(xi,yi\)∈𝒟raw\(x\_\{i\},y\_\{i\}\)\\in\\mathcal\{D\}\_\{raw\}do
5:Classify into Type A/B/C based on solving attempts
6:Decompose into modeling logic
mim\_\{i\}and coding implementation
cic\_\{i\}
7:Generate embeddings:
eim←Embed\(mi\)e\_\{i\}^\{m\}\\leftarrow\\text\{Embed\}\(m\_\{i\}\),
eic←Embed\(ci\)e\_\{i\}^\{c\}\\leftarrow\\text\{Embed\}\(c\_\{i\}\)
8:Extract instance\-level knowledge:
9:
ϕiapproach←Extractapproach\(Type A/B\)\\phi\_\{i\}^\{approach\}\\leftarrow\\text\{Extract\}\_\{\\text\{approach\}\}\(\\text\{Type A/B\}\)
10:
ϕichecklist←Extractchecklist\(Type A/B\)\\phi\_\{i\}^\{checklist\}\\leftarrow\\text\{Extract\}\_\{\\text\{checklist\}\}\(\\text\{Type A/B\}\)
11:
ϕipitfall←Extractpitfall\(Type B/C\)\\phi\_\{i\}^\{pitfall\}\\leftarrow\\text\{Extract\}\_\{\\text\{pitfall\}\}\(\\text\{Type B/C\}\)
12:
Φi←⟨ϕiapproach,ϕichecklist,ϕipitfall⟩\\Phi\_\{i\}\\leftarrow\\langle\\phi\_\{i\}^\{approach\},\\phi\_\{i\}^\{checklist\},\\phi\_\{i\}^\{pitfall\}\\rangle
13:endfor
14:// Phase 2: Cluster\-Level Evolution
15:foreach node
nin\_\{i\}with
\(eim,eic,Φi\)\(e\_\{i\}^\{m\},e\_\{i\}^\{c\},\\Phi\_\{i\}\)do
16:// Modeling Cluster Assignment
17:
𝒞candM←TopK\-Retrieve\(eim,\{μkM\}\)\\mathcal\{C\}\_\{cand\}^\{M\}\\leftarrow\\text\{TopK\-Retrieve\}\(e\_\{i\}^\{m\},\\\{\\mu\_\{k\}^\{M\}\\\}\)⊳\\trianglerightk candidate clusters
18:if
LLMverify\(mi,𝒞candM\)\\text\{LLM\}\_\{\\text\{verify\}\}\(m\_\{i\},\\mathcal\{C\}\_\{cand\}^\{M\}\)returns match at cluster
𝒞jM\\mathcal\{C\}\_\{j\}^\{M\}then
19:Add
nin\_\{i\}to
𝒞jM\\mathcal\{C\}\_\{j\}^\{M\}
20:else
21:Create new cluster
𝒞newM\\mathcal\{C\}\_\{new\}^\{M\}with centroid
μnewM=eim\\mu\_\{new\}^\{M\}=e\_\{i\}^\{m\}
22:endif
23:// Coding Cluster Assignment \(similar process\)
24:Assign
cic\_\{i\}to coding cluster
𝒞kC\\mathcal\{C\}\_\{k\}^\{C\}
25:// Knowledge Update
26:if
\|𝒞jM\|new≥N\|\\mathcal\{C\}\_\{j\}^\{M\}\|\_\{new\}\\geq Nthen⊳\\trianglerightThreshold reached
27:
𝒦jM←LLMsynth\(𝒦jM\(t\)∪⋃n∈newΦn\)\\mathcal\{K\}\_\{j\}^\{M\}\\leftarrow\\text\{LLM\}\_\{\\text\{synth\}\}\\left\(\\mathcal\{K\}\_\{j\}^\{M\(t\)\}\\cup\\bigcup\_\{n\\in\\text\{new\}\}\\Phi\_\{n\}\\right\)
28:endif
29:// Bipartite Graph Update
30:Update edge weight:
wjk←wjk\+1w\_\{jk\}\\leftarrow w\_\{jk\}\+1for
\(𝒞jM,𝒞kC\)\(\\mathcal\{C\}\_\{j\}^\{M\},\\mathcal\{C\}\_\{k\}^\{C\}\)
31:endforreturn
𝒞M,𝒞C,𝒢=\(𝒱M,𝒱C,ℰ\)\\mathcal\{C\}^\{M\},\\mathcal\{C\}^\{C\},\\mathcal\{G\}=\(\\mathcal\{V\}^\{M\},\\mathcal\{V\}^\{C\},\\mathcal\{E\}\)
Algorithm 2Memory\-Augmented Inference1:New problem
xnewx\_\{new\}, Memory
𝒟\\mathcal\{D\}, Bipartite graph
𝒢\\mathcal\{G\}
2:Solution
y^\\hat\{y\}\(modeling
m^\\hat\{m\}and code
c^\\hat\{c\}\)
3:// Phase 1: Dual\-Retrieval
4:Encode problem:
enew←Embed\(xnew\)e\_\{new\}\\leftarrow\\text\{Embed\}\(x\_\{new\}\)
5:
ℋ←argmaxK\{sim\(enew,ei\)\|xi∈𝒟\}\\mathcal\{H\}\\leftarrow\\arg\\max\_\{K\}\\\{\\text\{sim\}\(e\_\{new\},e\_\{i\}\)\|x\_\{i\}\\in\\mathcal\{D\}\\\}⊳\\trianglerightInstance\-Level Retrieval
6:
𝒮cluster←argmaxK\{sim\(enew,μkM\)\}\\mathcal\{S\}\_\{cluster\}\\leftarrow\\arg\\max\_\{K\}\\\{\\text\{sim\}\(e\_\{new\},\\mu\_\{k\}^\{M\}\)\\\}⊳\\trianglerightCluster\-Level Retrieval
7:
ℛ←\{𝒞M\(xi\)\|xi∈ℋ\}∪𝒮cluster\\mathcal\{R\}\\leftarrow\\\{\\mathcal\{C\}^\{M\}\(x\_\{i\}\)\|x\_\{i\}\\in\\mathcal\{H\}\\\}\\cup\\mathcal\{S\}\_\{cluster\}⊳\\trianglerightCombine Retrieval Results
8:// Generate Trajectory Pool via Graph
9:
𝒫←∅\\mathcal\{P\}\\leftarrow\\emptyset
10:foreach
𝒞iM∈ℛ\\mathcal\{C\}\_\{i\}^\{M\}\\in\\mathcal\{R\}do
11:
𝒩k←TopK\-Neighbors\(𝒞iM,𝒢\)\\mathcal\{N\}\_\{k\}\\leftarrow\\text\{TopK\-Neighbors\}\(\\mathcal\{C\}\_\{i\}^\{M\},\\mathcal\{G\}\)⊳\\trianglerightTop\-k coding clusters
12:foreach
𝒞jC∈𝒩k\\mathcal\{C\}\_\{j\}^\{C\}\\in\\mathcal\{N\}\_\{k\}do
13:Add path
p=\(𝒞iM,𝒞jC\)p=\(\\mathcal\{C\}\_\{i\}^\{M\},\\mathcal\{C\}\_\{j\}^\{C\}\)to
𝒫\\mathcal\{P\}
14:endfor
15:endfor
16:// Rank Paths
17:
𝒬←TopMp∈𝒫\(LLMselect\(p\|xnew\)\)\\mathcal\{Q\}\\leftarrow\\text\{TopM\}\_\{p\\in\\mathcal\{P\}\}\(\\text\{LLM\}\_\{\\text\{select\}\}\(p\|x\_\{new\}\)\)
18:// Phase 2: Solving via Generalized Knowledge
19:foreach path
pt=\(𝒞iM,𝒞jC\)∈𝒬p\_\{t\}=\(\\mathcal\{C\}\_\{i\}^\{M\},\\mathcal\{C\}\_\{j\}^\{C\}\)\\in\\mathcal\{Q\}do
20:// Modeling Phase
21:
m^raw←LLMgen\(xnew\|𝒦iapproach\)\\hat\{m\}\_\{raw\}\\leftarrow\\text\{LLM\}\_\{\\text\{gen\}\}\(x\_\{new\}\|\\mathcal\{K\}\_\{i\}^\{approach\}\)
22:
m^←LLMverify\(m^raw\|𝒦ichecklist\)\\hat\{m\}\\leftarrow\\text\{LLM\}\_\{\\text\{verify\}\}\(\\hat\{m\}\_\{raw\}\|\\mathcal\{K\}\_\{i\}^\{checklist\}\)
23:// Coding Phase
24:
c^raw←LLMgen\(m^\|𝒦japproach\)\\hat\{c\}\_\{raw\}\\leftarrow\\text\{LLM\}\_\{\\text\{gen\}\}\(\\hat\{m\}\|\\mathcal\{K\}\_\{j\}^\{approach\}\)
25:
c^←LLMverify\(c^raw\|𝒦jchecklist\)\\hat\{c\}\\leftarrow\\text\{LLM\}\_\{\\text\{verify\}\}\(\\hat\{c\}\_\{raw\}\|\\mathcal\{K\}\_\{j\}^\{checklist\}\)
26:// Execution & Repair
27:
\(result,error\)←ℰ\(c^\)\(result,error\)\\leftarrow\\mathcal\{E\}\(\\hat\{c\}\)⊳\\trianglerightExecute code
28:if
error=Noneerror=\\text\{None\}thenreturn
y^=\(result,m^,c^\)\\hat\{y\}=\(result,\\hat\{m\},\\hat\{c\}\)
29:else
30:// Repair Attempt
31:
c^fixed←LLMfix\(c^,error\|𝒦jpitfall,𝒦jchecklist\)\\hat\{c\}\_\{fixed\}\\leftarrow\\text\{LLM\}\_\{\\text\{fix\}\}\(\\hat\{c\},error\|\\mathcal\{K\}\_\{j\}^\{pitfall\},\\mathcal\{K\}\_\{j\}^\{checklist\}\)
32:
\(result,error\)←ℰ\(c^fixed\)\(result,error\)\\leftarrow\\mathcal\{E\}\(\\hat\{c\}\_\{fixed\}\)
33:if
error=Noneerror=\\text\{None\}thenreturn
y^=\(result,m^,c^fixed\)\\hat\{y\}=\(result,\\hat\{m\},\\hat\{c\}\_\{fixed\}\)
34:endif
35:endif
36:// Backtrack to next path
37:Continue to next
pt\+1p\_\{t\+1\}in
𝒬\\mathcal\{Q\}
38:endforreturnFailure⊳\\trianglerightAll paths exhausted
## Appendix BDetailed Algorithm Descriptions
### B\.1Dual\-Cluster Memory Construction
Algorithm[1](https://arxiv.org/html/2604.20183#alg1)describes the process of constructing our dual\-cluster memory system from historical problem\-solution trajectories\. The algorithm operates in two phases: node\-level construction and cluster\-level evolution\. In the node\-level phase, each historical problem\-solution pair is first categorized into three types based on its solving trajectory: Type A \(always correct\), Type B \(recovered\), and Type C \(persistent failures\)\. We then decompose each solution into separate modeling logicmim\_\{i\}and coding implementationcic\_\{i\}components, generate their semantic embeddings\(eim,eic\)\(e\_\{i\}^\{m\},e\_\{i\}^\{c\}\), and extract structured knowledgeΦi=⟨ϕiapproach,ϕichecklist,ϕipitfall⟩\\Phi\_\{i\}=\\langle\\phi\_\{i\}^\{approach\},\\phi\_\{i\}^\{checklist\},\\phi\_\{i\}^\{pitfall\}\\rangle\. The approach knowledge captures solution templates, the checklist defines verification criteria, and the pitfall documents common errors\.
In the cluster\-level phase, we organize nodes into coherent clusters using a hybrid approach combining embedding similarity with LLM\-powered semantic verification\. For each node, its modeling component is assigned to a cluster𝒞jM\\mathcal\{C\}\_\{j\}^\{M\}by retrieving top\-kksimilar clusters and verifying semantic alignment\. The coding component undergoes identical clustering independently\. When a cluster accumulatesNNnew nodes, we trigger knowledge synthesis where an LLM consolidates instance\-level knowledge into generalized cluster\-level guidelines\. Finally, the natural associations between modeling and coding clusters form a weighted bipartite graph𝒢\\mathcal\{G\}, where edge weights indicate co\-occurrence frequencies and capture proven compatibility between modeling paradigms and coding strategies\.
### B\.2Memory\-Augmented Inference
Algorithm[2](https://arxiv.org/html/2604.20183#alg2)presents our inference procedure for solving novel optimization problems using the constructed dual\-cluster memory\. The algorithm implements a systematic Generate\-Verify\-Repair\-Backtrack pipeline guided by cluster knowledge\.
The inference begins with a dual\-retrieval phase\. We first encode the new problem into an embeddingenewe\_\{new\}and retrieve the top\-KKmost similar historical instances, capturing fine\-grained problem patterns\. Complementary to this, we retrieve top\-KKmodeling clusters by comparingenewe\_\{new\}with cluster centroids, capturing general algorithmic paradigms\. We combine both retrieval sources and, for each identified modeling cluster, query the bipartite graph to retrieve top\-kkcompatible coding clusters based on edge weights\. This generates a pool of candidate solution paths𝒫\\mathcal\{P\}, each representing a complete modeling\-to\-coding pipeline\. An LLM selector then ranks these paths by their alignment with the new problem, forming a prioritized queue𝒬\\mathcal\{Q\}\.
The solving phase executes paths from𝒬\\mathcal\{Q\}sequentially\. For each path, we first generate modeling logicm^\\hat\{m\}guided by the modeling cluster’s approach knowledge and verify it against the checklist\. Once validated, we generate executable codec^\\hat\{c\}using the coding cluster’s knowledge and similarly verify it\. The code is then executed by the solver engineℰ\\mathcal\{E\}\. On success, we return the solution\. On failure, we initiate knowledge\-guided repair by providing the repair LLM with the error message, cluster\-specific pitfalls, and a verification checklist\. This targeted repair dramatically outperforms generic debugging by grounding fixes in paradigm\-specific failure modes\. If repair attempts fail within the predefined limit \(default: 2 attempts\), we backtrack to the next path in𝒬\\mathcal\{Q\}\.
## Appendix COptimization Solvers and Libraries
Our framework supports multiple optimization solvers to handle diverse problem types\.
Gurobiis a state\-of\-the\-art free research solver for linear programming \(LP\), mixed\-integer linear programming \(MILP\), and quadratic programming \(QP\), providing exact solutions with provable optimality guarantees\. Its branch\-and\-cut algorithm excels at large\-scale combinatorial problems such as facility location, scheduling, and resource allocation\.
PuLPis an open\-source linear programming modeler offering a unified Python interface to various solvers, including CBC, GLPK, and CPLEX\. It is particularly suitable for rapid prototyping and straightforward LP/MILP problems\.
OR\-Toolsis Google’s optimization suite featuring constraint programming \(CP\-SAT\) and routing solvers\. It handles combinatorial problems with complex logical constraints, such as scheduling with precedence relations, assignment problems, and vehicle routing, supporting both linear and non\-linear constraints\.
SciPyprovides gradient\-based methods \(L\-BFGS\-B, SLSQP\) and derivative\-free algorithms \(Nelder\-Mead\) for continuous optimization\. It is effective for non\-linear programming problems, including parameter tuning, curve fitting, and engineering design with non\-convex objectives\.
NetworkXis a graph analysis library providing efficient implementations of classic algorithms like Dijkstra’s shortest path, Ford\-Fulkerson maximum flow, and minimum spanning tree\. It serves as a specialized solver for network optimization problems with clear graph topology, such as transportation and communication networks\.
The bipartite graph in our dual\-cluster memory learns associations between modeling paradigms and solver choices through historical co\-occurrence patterns\. For instance, integer linear programs typically pair with Gurobi or OR\-Tools clusters, while continuous non\-linear problems align with SciPy clusters\. During inference, the framework automatically selects appropriate solvers by querying this learned graph \(Algorithm[2](https://arxiv.org/html/2604.20183#alg2)\), and supports fallback to alternative solvers if the primary choice fails, ensuring robustness across different settings\.
## Appendix DDetails of Human Annotators
For the collection, annotation, and verification of the 500 optimization problems in our dataset, we engage Ph\.D\. candidates and Master’s students with expertise in Operations Research and Applied Mathematics, who are also co\-authors of this paper\. All annotators possess strong academic backgrounds, ensuring their qualifications to accurately formulate problems, verify solution correctness, and maintain the technical precision of domain\-specific terminology and mathematical notations\. Since the annotators are co\-authors involved in this study, no formal external recruitment process or monetary compensation is required, and they are fully informed of the data collection and usage protocols\. The annotation process focuses exclusively on creating and evaluating optimization problems, mathematical formulations, and solution approaches, without involving the collection of any personally identifying information or exposing annotators to potential risks\. As this research centers on the development and analysis of mathematical optimization content rather than involving external human subjects or sensitive data, it is determined to be exempt from formal institutional review board approval\.Similar Articles
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
This paper introduces GenericAgent, a self-evolving LLM agent system designed to maximize context information density. It addresses long-horizon limitations through hierarchical memory, reusable SOPs, and efficient compression, achieving better performance with fewer tokens compared to leading agents.
rohitg00/agentmemory
agentmemory is an open-source persistent memory layer for AI coding agents (Claude Code, Cursor, Gemini CLI, Codex CLI, etc.) that uses knowledge graphs, confidence scoring, and hybrid search to give agents long-term memory across sessions via MCP, hooks, or REST API. Built on the iii engine, it requires no external databases and exposes 51 MCP tools.
SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees
This paper introduces Sequential Agent Tuning (SAT), a coordinator-free training paradigm for multi-LLM teams that provides monotonic improvement guarantees and plug-and-play invariance, enabling smaller models to outperform larger ones.
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
This paper proposes the Experience Compression Spectrum, a unifying framework that integrates agent memory, skill discovery, and rule-based systems along a single axis of increasing compression (5-20× for episodic memory, 50-500× for procedural skills, 1000×+ for declarative rules). The work identifies a critical gap—the 'missing diagonal'—showing that existing systems operate at fixed compression levels without adaptive cross-level support, and articulates design principles for scalable, full-spectrum agent learning systems.
Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks
Researchers introduce BEHEMOTH benchmark and CluE cluster-based prompt optimization to enable LLMs to extract and retain heterogeneous memory across diverse tasks, achieving 9% gains over prior self-evolving frameworks.