Agri-SAGE: Simulation-Grounded Multi-Agent LLM for Context-Aware Agricultural Advisory Generation

arXiv cs.AI Papers

Summary

This paper introduces Agri-SAGE, a closed-loop framework that integrates multi-agent LLM reasoning with biophysical simulation (APSIM) to generate and validate context-aware agricultural advisories. The framework outperforms static baselines in retrospective analysis, with Tree-of-Thoughts achieving peak yields and Reflexion reducing computational cost via episodic memory.

arXiv:2607.00454v1 Announce Type: new Abstract: Agricultural advisory systems face a fundamental tension: static agronomic guidelines offer consistent, evidence-based recommendations, yet remain blind to in-season variability and dynamic uncertainties. Recent advisory systems powered by LLMs are liable for a different risk of generating recommendations that are agronomically credible but physiologically unconvincing. Agri-SAGE is a closed-loop framework designed to resolve the above two limitations by integrating retrieval-grounded multi-agent LLM reasoning with APSIM-based biophysical simulation, to generate and validate agronomic advisories. To assess this framework, we evaluate three reasoning approaches, namely Plan-and-Solve, Tree of Thoughts, and Reflexion, over a 10-year retrospective analysis. All three significantly outperform static PoP (Package-of-Practice) baselines, with Tree of Thoughts achieving impressive peak yields. At the same time, Reflexion achieves comparable agronomic outcomes at substantially lower computational cost by leveraging cross-seasonal episodic memory.
Original Article
View Cached Full Text

Cached at: 07/02/26, 05:40 AM

# Agri-SAGE: Simulation-Grounded Multi-Agent LLM for Context-Aware Agricultural Advisory Generation
Source: [https://arxiv.org/html/2607.00454](https://arxiv.org/html/2607.00454)
Vedant Balasubramaniam1, Geetha Charan1, Manojkumar Patil1, Rohit P Suresh1, V Priyanka1, Kodur Sai Vinay Sathvik2and Y\. Narahari11Indian Institute of Science, Bengaluru\{vedantb, geethacharan, pmanojkumar, rohitpsuresh, priyankav, narahari\}@iisc\.ac\.in2BNM Institute of Technology, Bengalurusathvik2004@gmail\.com

###### Abstract

Agricultural advisory systems face a fundamental tension: static agronomic guidelines offer consistent, evidence\-based recommendations, yet remain blind to in\-season variability and dynamic uncertainties\. Recent advisory systems powered by LLMs are liable for a different risk of generating recommendations that are agronomically credible but physiologically unconvincing\. Agri\-SAGE is a closed\-loop framework designed to resolve the above two limitations by integrating retrieval\-grounded multi\-agent LLM reasoning with APSIM\-based biophysical simulation, to generate and validate agronomic advisories\.\. To assess this framework, we evaluate three reasoning approaches, namelyPlan\-and\-Solve,Tree of Thoughts, andReflexion, over a 10\-year retrospective analysis\. All three significantly outperform static PoP \(Package\-of\-Practice\) baselines, with Tree of Thoughts achieving impressive peak yields\. At the same time, Reflexion achieves comparable agronomic outcomes at substantially lower computational cost by leveraging cross\-seasonal episodic memory\.

## IINTRODUCTION

Agriculture remains the cornerstone of food security and rural livelihoods across much of the developing world\. In India, where a substantial share of the population depends on farming for subsistence and income, the quality of agricultural guidance available to farmers has profound implications for productivity and economic stability\.

Traditionally, agricultural advisories in India are delivered throughPackages of Practices\(PoPs\)\. which contain agronomic guidelines derived from extensive field trials and agronomic research\. These recommendations typically include land preparation methods, fertilizer schedules, irrigation practices, and pest management strategies\.

While scientifically grounded, PoPs are inherently static as they are issued before the cropping season at the agro\-climatic zone level and updated infrequently\. As a result, they cannot adapt to season variability such as weather shocks, pest outbreaks, or localized soil heterogeneity\. Consequently, farmers receive recommendations that may not be suitable to their specific field conditions\.

To partially address this limitation, agricultural advisory systems also provideIn\-Season Advisories\. These are dynamic, time\-sensitive recommendations generated during the cropping season based on real\-time environmental signals, including weather forecasts, temperature patterns, pest incidence, and crop phenological stages\. In India, Agro\-Meteorological Field Units \(AMFUs\) are responsible for generating such advisories\. However, these advisories remain largely manual and expert\-driven, limiting scalability and the ability to personalize recommendations\.

Recent advances in LLMs have enabled agricultural advisory systems capable of synthesizing large agronomic knowledge bases and interacting with farmers through natural language interfaces\. While these systems significantly improve accessibility and knowledge dissemination, they remain constrained as they may generate plausible but incorrect recommendations\.

To address these limitations, we introduce Agri\-SAGE, a novel framework that reconceives the traditional Package of Practices as a dynamic, context\-aware agronomic plan\. At its core, Agri\-SAGE uses a multi\-agent LLM architecture, with specialized agents to generate, evaluate, and refine agricultural recommendations\. By combining agronomic knowledge retrieval with real\-time environmental inputs, the system produces season\-long management strategies that are coherent and adaptive\.

A central challenge in AI\-driven agronomic advice is ensuring that recommendations remain physically plausible and safe to act on\. To address this, Agri\-SAGE validates all generated advisory plans using the Agricultural Production Systems sIMulator \(APSIM\)\[[4](https://arxiv.org/html/2607.00454#bib.bib20)\], a widely adopted process\-based crop simulation model\. Unlike prior agricultural advisory systems that rely solely on textual reasoning, Agri\-SAGE grounds every recommendation in crop physiology through APSIM simulation, ensuring that generated advisories remain biophysically feasible\.

Contributions

- •We propose a novel, closed\-loop autonomous AI Agronomist framework that couples LLMs with the APSIM biophysical crop simulator, successfully grounding generative semantic reasoning
- •We conduct a rigorous 10\-year comparative analysis of advanced reasoning methodologies, namelyPlan\-and\-Solve \(PS\),Tree of Thoughts \(ToT\), andReflexion\. We demonstrate that all these approaches significantly outperform static PoP baselines withTree of Thoughtsachieving the highest yield\.
- •We observe that the proposed method discovers significantly better agronomic practices than static PoP\. In Mandya region, when maize is simulated under local conditions the optimized fertilizer, irrigation splits along with other operations resulted in better outcome\.

## IIRELATED WORK

### II\-ALLM\-Based Agricultural Advisory Systems

Recent work has explored the application of LLMs for automated agricultural advisory through retrieval\-augmented generation \(RAG\) and domain adaptation\. Farmer\.Chat\[[9](https://arxiv.org/html/2607.00454#bib.bib11)\]demonstrates large\-scale deployment of conversational agricultural advisory using curated agronomic knowledge bases combined with weather APIs\. Similar systems such as ShizishanGPT\[[16](https://arxiv.org/html/2607.00454#bib.bib12)\], AgriGPT\[[14](https://arxiv.org/html/2607.00454#bib.bib13)\], and AgriRegion\[[3](https://arxiv.org/html/2607.00454#bib.bib15)\]extend this paradigm through techniques including knowledge graphs, region\-aware retrieval, and domain\-specific fine\-tuning\.

These approaches significantly improve factual grounding compared to generic LLMs and reduce hallucinations through structured retrieval pipelines\[[10](https://arxiv.org/html/2607.00454#bib.bib16)\]\. However, they remain fundamentally limited by the knowledge available within their underlying corpora\. Consequently, they cannot reason about agronomic scenarios that are absent from the retrieved documents\.

Several studies highlight the reliability challenges associated with corpus\-grounded advisory systems\. Evaluations of GPT\-based systems for pest management show that performance improves substantially only when explicit domain context is injected\[[15](https://arxiv.org/html/2607.00454#bib.bib17)\]\. Broader analyses demonstrate that LLMs may generate confident but incorrect agronomic recommendations when operating outside their training distribution\[[2](https://arxiv.org/html/2607.00454#bib.bib18)\]\. Multi\-agent frameworks such as AgroAskAI attempt to mitigate these issues by introducing specialized agents and reviewer modules that critique generated responses\[[1](https://arxiv.org/html/2607.00454#bib.bib19)\]\.

### II\-BSimulation Models and Multi\-Agent Systems

Process\-based crop simulation models provide a complementary mechanism for evaluating management decisions\. Systems such as APSIM NextGen engine simulate crop growth, water balance, and nutrient dynamics at daily resolution and are widely used for agronomic scenario analysis\.

Recent studies have begun exploring the integration of language models with simulation environments\. Wu et al\.\[[12](https://arxiv.org/html/2607.00454#bib.bib21)\]demonstrate that language models can reason over DSSAT simulator outputs for crop management optimization\. MCP\-SIM\[[7](https://arxiv.org/html/2607.00454#bib.bib3)\]introduces a self\-correcting multi\-agent framework that transforms underspecified prompts into validated simulations through iterative plan–act–reflect–revise cycles, improving convergence efficiency across multiple task settings\.

More broadly, multi\-agent LLM frameworks have demonstrated effectiveness across complex reasoning tasks\. Jia et al\.\[[6](https://arxiv.org/html/2607.00454#bib.bib1)\]proposed a feedback\-driven multi\-agent framework that combines enhanced retrieval, reasoning modules, and environmental actions, achieving strong performance in power system optimization tasks\. Xia et al\.\[[13](https://arxiv.org/html/2607.00454#bib.bib2)\]presented specialized LLM agents for automated parametrization of simulation models in digital twins\. Similarly, Reflexion\[[8](https://arxiv.org/html/2607.00454#bib.bib4)\]introduces episodic memory\-based feedback mechanisms that enable language agents to iteratively refine solutions without updating model weights\. CodeSim\[[5](https://arxiv.org/html/2607.00454#bib.bib6)\]extends this paradigm to program synthesis through internal debugging via simulated input–output execution\.

Existing systems either rely solely on textual reasoning without physiological validation or optimize isolated decisions without producing complete season\-long plans\. Agri\-SAGE addresses both gaps by combining LLM\-based retrieval and reasoning with APSIM simulation, ensuring that full\-season agronomic advisories are grounded in crop physiology rather than solely in text\.

## IIIMETHODOLOGY

Figure[1](https://arxiv.org/html/2607.00454#S3.F1)shows the architecture of Agri\-SAGE\. It comprises three primary modules: 1\) Retrieval Agent, 2\) Generation Agent, and 3\) Verification and Feedback Agent\.

![Refer to caption](https://arxiv.org/html/2607.00454v1/simadv.drawio.png)Figure 1:Architecture of Agri\-SAGE### III\-AThe Retrieval Agent

The framework uses a Retrieval Agent to ground LLM outputs in a curated knowledge base of regional manuscripts and Package of Practices \(PoPs\)\. By seeding the action context with PoPs from local agricultural universities, it anchors recommendations to crop inputs, fertilizer types, and operational methods that farmers in the area can realistically access and afford\.

To address the high dimensionality of agricultural texts, we implemented semantic\-aware hierarchical chunking in combination with hybrid dense–sparse retrieval mechanisms\. The Retrieval Agent dynamically processes environmental inputs, such as real\-time weather forecasts, soil profiles, crop phenology, and solar radiation, and translates them into optimized vector search queries\. This ensures that the Generation Agent receives highly relevant, context\-specific agronomic literature prior to formulating its advisory outputs\.

![Refer to caption](https://arxiv.org/html/2607.00454v1/reasoning_methods.drawio.png)Figure 2:Reasoning Methodologies: Plan and Solve \(left\), Tree of Thought \(middle\), and Reflexion \(right\)\.
### III\-BThe Generation Agent

The Generation Agent serves as the central decision\-making component of the proposed framework\. It synthesizes the localized agronomic context retrieved by the Retrieval Agent with the real\-time environmental state to formulate detailed, actionable agricultural advisories\.

We formalize the agronomic advisory task as an iterative optimization problem where the APSIM simulator serves as a deterministic environment providing an observation after each iteration\. The LLM generates an optimized action using the observations from APSIM\. We evaluate three distinct reasoning methodologies namely, Plan\-and\-Solve \(PS\), Tree of Thoughts \(ToT\), and Reflexion illustrated in Figure[2](https://arxiv.org/html/2607.00454#S3.F2)\.

Plan\-and\-Solve \(PS\)

We adapt the PS prompting strategy\[[11](https://arxiv.org/html/2607.00454#bib.bib23)\]into a closed\-loop biophysical framework\. At each iteration, the agent receives the prior observation and processes it sequentially: first generating a textual critique diagnosing the specific biophysical failures of the previous action, then generating a revised plan, and finally executing the new action\.

The deterministic feedback from the APSIM simulator, along with baseline yields, guides the LLM to autonomously self\-correct\. Through this mechanism, the agent iteratively converges on optimized and improved agronomic actions\.

Tree of Thoughts \(ToT\)

Following Yao et al\.\[[17](https://arxiv.org/html/2607.00454#bib.bib7)\], ToT is implemented as an anticipatory lookahead search\. Before executing any action in the simulator, the agent generatesk=3distinct, mutually exclusive agronomic intervention pathways\. An internal LLM evaluation explicitly critiques the biophysical pros and cons of each path against the current weather forecast\. The agent selects the highest\-scoring path to translate into the final action\. Only this advisory is sent to the APSIM simulator\. By explicitly forcing the generation of three divergent branches prior to execution, ToT prevents the agent from aggressively optimizing a fundamentally flawed initial strategy\.

Reflexion

Our Reflexion implementation follows Shinn et al\.\[[8](https://arxiv.org/html/2607.00454#bib.bib4)\], extending the PS framework by integrating a persistent episodic memory module\. In this context, episodic memory is defined as a textual database that stores distilled heuristic rules derived from past simulation years \(e\.g\., logging that a mid\-June sowing date historically fails during drought years\)\. After each simulated growing season, the agent generates a reflection diagnosing the season’s macro\-level success or failure, which is appended to memory\. In subsequent years, the policy is conditioned on this historical context\. By caching biophysical dynamics into a textual memory bank, Reflexion enables zero\-shot inference\-time adaptation across diverse weather cycles\.

### III\-CThe Verification and Feedback Agent

Within the Verification and Feedback Agent, a translation layer converts the output of the Generation Agent into structured, executable tool calls for APSIM which include setting the sowing date, type of fertilizer \(ex Urea, DAP, etc\) and its amount and date of application, amount of irrigation\(mm\) and date of application, and more operations like surface mulch \(kg/ha\) and tillage events\.

APSIM then simulates the full crop life cycle under the prescribed advisories\. While the APSIM engine dynamically computes various biophysical variables \(e\.g\., Leaf Area Index, total above\-ground biomass, and carbon dynamics\), the Feedback Agent isolates and extracts three primary metrics to form the feedback:

1. 1\.Grain Yield \(kg/ha\\mathrm\{k\}\\mathrm\{g\}\\mathrm\{/\}\\mathrm\{h\}\\mathrm\{a\}\): The final quantitative output of the crop, serving as the primary economic objective function for optimization\.
2. 2\.Water Stress Factor: A daily physiological metric indicating hydration sufficiency\.
3. 3\.Nitrogen Stress Factor: A daily physiological metric indicating nutrient deficiency\.

We selected these three metrics because Grain Yield serves as the ultimate benchmark of agronomic success, while the stress factors map directly to the primary actionable components: irrigation scheduling and fertilizer application\.

By exposing the underlying physical drivers of crop failure or success \(e\.g\., nitrogen leaching due to premature application before heavy rainfall\), the LLM can iteratively critique and refine its previous advisories to maximize yield\.

## IVDATASETS AND EXPERIMENTAL SETUP

### IV\-AWeather, Region, and Crop Selection

- •Region and Climate:We selected Mandya, Karnataka, India, a semi\-arid zone known for high year\-to\-year weather variability\. Between 2015 and 2024, this region experienced severe droughts and extreme monsoons, providing a rigorous testing ground to evaluate how well LLM agents adapt to climate stress\.
- •Weather Data:10 years of continuous daily weather data \(2015–2024\) were sourced from the ERA5 dataset111https://climate\.copernicus\.eu, including solar radiation, temperature, and precipitation\.
- •Soil Profile:The simulator was parameterized using Mandya red sandy soil \(Alfisol\), defining the baseline water\-holding capacity, bulk density, and organic carbon content\.

We restrict our experiments to a single crop: Maize \(Zea mays\)\. Although paddy \(rice\) is the primary staple crop in Mandya region, localized paddy models in the APSIM framework are currently limited\. Maize, one of the predominantly cultivated crops in the region, is economically critical and requires precise management to maintain yields during weather extremes\. Using a single crop establishes a controlled baseline, allowing us to accurately measure the LLM agents’ reasoning capabilities without the added complexity of multi\-crop dynamics\.Though we focus on maize, the architecture is crop\-agnostic and works for any crop supported by APSIM\.

### IV\-BAgronomic Knowledge Base

The retrieval corpus comprised approximately 1,000 open\-access research papers on maize cultivation, nitrogen use efficiency, and drought mitigation, scraped from Semantic Scholar, and supplemented by official Karnataka agricultural extension materials covering recommended sowing windows and cultivar selection\.

### IV\-CBaseline Formulation

We establish a benchmark using the “Gold Standard” Package of Practices \(PoP\) for the region of study, which represents the static advisory schedule followed by local farmers\. This PoP was translated into fixed APSIM tool calls, specifying predetermined sowing dates and fertilizer splits, and simulated across the 10\-year weather dataset to generate annual baseline yields\. These yields are provided to the Generation Agent as an optimization threshold against which the proposed framework was evaluated\.

### IV\-DExperimental Setup

Experiments were conducted using a 4\-bit quantized DeepSeek\-R1 model with the generation temperature set to0\.30\.3\. This low temperature ensures that the model strictly adheres to JSON tool\-calling syntax and deterministic agronomic logic, while retaining sufficient stochasticity to explore valid biophysical interventions without hallucinating unrealistic constraints\.

To account for the inherent stochasticity of LLM generation, each configuration \(Architecture×\\timesFeedback Depth×\\timesYear\) was executed in 10 independent runs\. This sample size captures the distribution of exploratory actions and stabilizes statistical variance, balancing statistical robustness with the computational cost of APSIM simulations\. The metrics are reported as the mean across runs to ensure robust comparative baselines\.

To quantify the Simulator\-in\-the\-Loop impact, we performed an ablation study over feedback depths ofN∈\{0,2,3,4\}N\\in\\\{0,2,3,4\\\}\. The 0\-loop configuration establishes a strict zero\-shot baseline relying solely on RAG context\. The multi\-shot configurations permit up to 4 iterative feedback loops\. The upper bound was deliberately capped at 4 iterations because empirical data indicates agents reach a biophysical optimization plateau within 3 to 4 loops\.

## VEXPERIMENTAL RESULTS

### V\-AQuantitative Analysis

![Refer to caption](https://arxiv.org/html/2607.00454v1/quant_analysis.png)

Figure 3:Yield Performance of Plan\-and\-Solve, Reflexion, and Tree of Thoughts versus baseline over 10‑year horizon \(mean over 10 runs\)\.

As illustrated in Figure[3](https://arxiv.org/html/2607.00454#S5.F3), all three reasoning architectures consistently outperformed the static baseline across the entire decade\. The baseline achieved a 10\-year average yield of 8,110 kg/ha, but struggled significantly during extreme weather events, such as the 2019 drought \(7,022 kg/ha\) and the 2024 heavy monsoons \(7,206 kg/ha\)\.

Both Plan\-and\-Solve and Reflexion achieved similar yield ceilings, averaging 9,045 kg/ha \(\+935 kg/ha\) and 9,002 kg/ha \(\+892 kg/ha\), respectively\. Plan\-and\-Solve’s gains stem from its closed\-loop critique\-and\-revise structure\. By first generating a textual diagnosis of biophysical failures before issuing each new action, the agent avoids compounding earlier errors across iterations\. Reflexion reaches a comparable ceiling through its persistent episodic memory bank, accumulating heuristic rules across simulated growing seasons\. However, both methods ultimately converge on similar performance limits due to their partly similar methodologies\.

In contrast, Tree of Thoughts \(ToT\) consistently outperformed all other methods by achieving a 10\-year average maximum yield of 9,262 kg/ha \(\+1,152 kg/ha over the baseline\)\. ToT’s superiority arises from its anticipatory lookahead design\. Before any action reaches the APSIM simulator, the agent generates three mutually exclusive agronomic intervention pathways and explicitly critiques each against the current weather forecast\. By forcing deliberate divergence prior to execution, ToT avoids the failure mode common to reactive strategies\. This structural advantage is most pronounced during severe climatological stress; for instance, during the 2019 drought and the 2024 floods, the ToT agent outperformed the baseline by 1,384 kg/ha and 1,375 kg/ha, respectively\.

### V\-BConvergence Dynamics

![Refer to caption](https://arxiv.org/html/2607.00454v1/ablation.png)Figure 4:Convergence behavior under increasing feedback iterations\.To understand the computational efficiency of each reasoning methodology, we analyzed intra\-episode convergence trajectories, tracking the average maximum yield at each iterative feedback step \(Figure[4](https://arxiv.org/html/2607.00454#S5.F4)\)\.

We observe that while the yields of advisories generated using PS increase significantly in the initial iterations, they flatten out after 3 to 4 feedback loops as the agent exhausts the low\-hanging corrective gains\. ToT often achieves optimal yield in Iteration 1 due to its tree\-like exploration strategy\. Similarly, Reflexion’s episodic memory bank enables faster convergence\. This establishes that forcing these agents beyond 3 iterations yields diminishing returns\. We observe that while ToT achieves peak yields, it consumesktimes more tokens per decision stage because it exploreskparallel paths simultaneously\. In contrast, Reflexion and PS incrementally refine a single path, making them superior in execution efficiency\.

### V\-CQualitative Analysis

TABLE I:Qualitative Comparison of Agent Strategies During the 2019 DroughtTo understand exactly how the AI agents achieved higher yields than the static heuristics, we analyzed their decision\-making during the 2019 simulation\. This year featured a severe late\-season drought, making it a perfect case study to see how each architecture handles extreme weather\.

#### V\-C1Baseline

In 2019, the PoP dictated mid\-June as the sowing date\. Because this date is hard\-coded, the maize crop entered its most sensitive reproductive stage precisely when the late\-season drought was at its worst, resulting in severe water stress and a low yield of 7,022 kg/ha\.

#### V\-C2Plan\-and\-Solve \(PS\)

Plan\-and\-Solve starts each year with a blank slate\. On its first attempt, it followed standard timing rules and suffered the same drought penalties as the baseline\. However, when PS received high water stress as feedback from the APSIM simulator\. the agent’s<critique\>module diagnosed the problem\. In its next iteration, the agent actively changed its plan: it added a thick layer of surface mulch to prevent water from evaporating, and it switched its fertilizer to Nitrate, which dissolves easily even in dry soil\. By responding to the simulator’s feedback, PS successfully recovered from failure and reached 8,168 kg/ha\.

#### V\-C3Tree of Thoughts \(ToT\)

The Tree of Thoughts \(ToT\) agent achieved the best result \(8,825 kg/ha\) by evaluating different strategies before making a final decision\. The ToT agent generated three possible paths by comparing a “Conservative & Early Sowing” path against a “Delayed Sowing” path\. Then it realized that planting in mid\-June would ruin the crop\.

Instead of trying to fight the drought, the ToT agent simply avoided it\. It shifted the sowing date a full month earlier to May 15\. Because of this early start, the crop finished growing before the extreme heat arrived\. ToT caught the timing error that the baseline missed, requiring only a single attempt to find the optimal solution\.

#### V\-C4Reflexion

While ToT looked ahead, Reflexion looked back\. Because the Reflexion agent stores text\-based summaries of past farming seasons, it was better able to anticipate and prepare for drought conditions than any other agent\. In 2019, it proactively applied 5,000 kg/ha of organic manure and scheduled a deep tillage event\. Mixing heavy manure deep into the soil acts like a sponge, significantly increasing the soil’s water\-holding capacity\. The baseline missed this soil\-preparation step entirely\. By leveraging experience, Reflexion achieved a high yield of 7,988 kg/ha on its first try without running multiple simulator iterations\. One of our observations in Reflexion was that, as the years progressed, yields in the first iteration increased, with subsequent loops yielding only minimal gains\.

In the early years \(2015–2016\), Reflexion underwent multiple iterations as it populated its episodic memory bank\. By 2017 and 2018, it successfully collapsed its simulator reliance to a single iteration\. However, during the severe drought anomaly of 2019 and 2020, Reflexion’s performance momentarily deteriorated, spiking to 4 iterations\.

## VICONCLUSION AND FUTURE WORK

We presented Agri\-SAGE, a closed\-loop multi\-agent framework that couples LLM\-based reasoning with APSIM biophysical crop simulation to generate agronomically validated, context\-aware agricultural advisories\. Across the 10\-year retrospective evaluation, Agri\-SAGE consistently outperforms the static Package\-of\-Practice baseline under diverse climatic conditions\. Three key insights emerge from our experiments\. First, all reasoning\-based methods substantially improve crop yield compared to static agronomic schedules, demonstrating the effectiveness of simulation\-grounded adaptive advisory generation\. Second, Tree of Thoughts consistently achieves the highest yield improvements by exploring multiple agronomic intervention pathways before committing to an action\. Third, Reflexion captures cross\-seasonal agronomic insights via episodic memory, achieving competitive yields with fewer simulator interactions\. Together, these results demonstrate that coupling LLM reasoning with crop simulation can transform the state of the art in agronomic advisories\.

Several directions remain open for future work\. The optimization objective should be extended beyond yield to incorporate input costs, profit margins, and sustainability metrics such as nitrogen leaching and carbon footprint\. The effect of real\-time weather forecasts also needs to be analyzed\. Additionally, APSIM predictions serve as a proxy for agronomic outcomes, and field validation remains an important next step given real\-world variability in soil and microclimate\. Finally, evaluating the framework with agronomically fine\-tuned model variants, using synthetic APSIM\-generated data as a training signal, could unlock stronger reasoning capabilities for novel climate scenarios outside historical distributions\.

## ACKNOWLEDGMENT

The authors thank Prof\. Soma Dhawala for his valuable suggestions and agronomic insights\. The authors also thank Dr\. Richard Magala for his helpful discussions regarding the apsimNGpy framework\.

The authors acknowledge the use of various AI tools and assistants, such as GPT and Claude, to enhance the writing quality and provide summaries of source materials\. However, the authors diligently verified all information provided by these AI tools to ensure accuracy and factual integrity\.

## References

- \[1\]\(2025\)AgroAskAI: a multi\-agentic AI framework for supporting smallholder farmers’ enquiries globally\.arXiv preprint arXiv:2512\.14910\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p3.1)\.
- \[2\]D\. De Clercq, E\. Nehring, H\. Mayne, and A\. Mahdi\(2024\)Large language models can help boost food production, but be mindful of their risks\.Frontiers in Artificial Intelligence7,pp\. 1326153\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p3.1)\.
- \[3\]M\. Fanuel, M\. N\. Mahmoud, C\. C\. Marshal, V\. Lakhotia, B\. Dari, K\. Roy, and S\. Zhang\(2025\)AgriRegion: region\-aware retrieval for high\-fidelity agricultural advice\.arXiv preprint arXiv:2512\.10114\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p1.1)\.
- \[4\]D\. P\. Holzworth, N\. I\. Huth, P\. G\. deVoil, E\. J\. Zurcher, N\. I\. Herrmann, G\. McLean, K\. Chenu, E\. J\. van Oosterom, V\. Snow, C\. Murphy,et al\.\(2014\)APSIM\-evolution towards a new generation of agricultural systems simulation\.Environmental Modelling & Software62,pp\. 327–350\.Cited by:[§I](https://arxiv.org/html/2607.00454#S1.p7.1)\.
- \[5\]M\. A\. Islam, M\. E\. Ali, and M\. R\. Parvez\(2025\)Codesim: multi\-agent code generation and problem solving through simulation\-driven planning and debugging\.arXiv preprint arXiv:2502\.05664\.Cited by:[§II\-B](https://arxiv.org/html/2607.00454#S2.SS2.p3.1)\.
- \[6\]M\. Jia, Z\. Cui, and G\. Hug\(2025\)Enhancing LLMs for power system simulations: a feedback\-driven multi\-agent framework\.IEEE Transactions on Smart Grid\.Cited by:[§II\-B](https://arxiv.org/html/2607.00454#S2.SS2.p3.1)\.
- \[7\]D\. Park, H\. Moon, and S\. Ryu\(2026\)A self\-correcting multi\-agent LLM framework for language\-based physics simulation and explanation\.npj Artificial Intelligence2\(1\),pp\. 10\.Cited by:[§II\-B](https://arxiv.org/html/2607.00454#S2.SS2.p2.1)\.
- \[8\]N\. Shinn, F\. Cassano, A\. Gopinath, K\. Narasimhan, and S\. Yao\(2023\)Reflexion: language agents with verbal reinforcement learning\.Advances in Neural Information Processing Systems36,pp\. 8634–8652\.Cited by:[§II\-B](https://arxiv.org/html/2607.00454#S2.SS2.p3.1),[§III\-B](https://arxiv.org/html/2607.00454#S3.SS2.p9.1)\.
- \[9\]N\. Singh, J\. Wang’ombe, N\. Okanga, T\. Zelenska, J\. Repishti, S\. Mishra, R\. Manokaran, V\. Singh, M\. I\. Rafiq, R\. Gandhi,et al\.\(2024\)Farmer\. chat: scaling ai\-powered agricultural services for smallholder farmers\.arXiv preprint arXiv:2409\.08916\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p1.1)\.
- \[10\]S\. Singh, N\. Ganesh, V\. Singh, L\. Pedapudi, R\. Kumar, S\. Jyothi, A\. Karanam, C\. Yashoda, M\. V\. R\. Reddy, S\. P\. Debbesa,et al\.\(2026\)Fine\-tuning and evaluating conversational ai for agricultural advisory\.arXiv preprint arXiv:2603\.03294\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p2.1)\.
- \[11\]L\. Wang, W\. Xu, Y\. Lan, Z\. Hu, Y\. Lan, R\. K\. Lee, and E\. Lim\(2023\)Plan\-and\-solve prompting: improving zero\-shot chain\-of\-thought reasoning by large language models\.InProceedings of the 61st annual meeting of the association for computational linguistics \(volume 1: long papers\),pp\. 2609–2634\.Cited by:[§III\-B](https://arxiv.org/html/2607.00454#S3.SS2.p4.1)\.
- \[12\]J\. Wu, Z\. Lai, S\. Chen, R\. Tao, P\. Zhao, and N\. Hovakimyan\(2024\)The new agronomists: language models are experts in crop management\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 5346–5356\.Cited by:[§II\-B](https://arxiv.org/html/2607.00454#S2.SS2.p2.1)\.
- \[13\]Y\. Xia, D\. Dittler, N\. Jazdi, H\. Chen, and M\. Weyrich\(2024\)LLM experiments with simulation: large language model multi\-agent system for simulation model parametrization in digital twins\.In2024 IEEE 29th International Conference on Emerging Technologies and Factory Automation \(ETFA\),pp\. 1–4\.Cited by:[§II\-B](https://arxiv.org/html/2607.00454#S2.SS2.p3.1)\.
- \[14\]B\. Yang, Y\. Zhang, L\. Feng, Y\. Chen, J\. Zhang, X\. Xu, N\. Aierken, Y\. Li, Y\. Chen, G\. Yang,et al\.\(2025\)AgriGPT: a large language model ecosystem for agriculture\.arXiv preprint arXiv:2508\.08632\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p1.1)\.
- \[15\]S\. Yang, Z\. Yuan, S\. Li, R\. Peng, K\. Liu, and P\. Yang\(2024\)GPT\-4 as evaluator: evaluating large language models on pest management in agriculture\.arXiv preprint arXiv:2403\.11858\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p3.1)\.
- \[16\]S\. Yang, Z\. Liu, W\. Mayer, N\. Ding, Y\. Wang, Y\. Huang, P\. Wu, W\. Li, L\. Li, H\. Zhang,et al\.\(2024\)ShizishanGPT: an agricultural large language model integrating tools and resources\.InInternational Conference on Web Information Systems Engineering,pp\. 284–298\.Cited by:[§II\-A](https://arxiv.org/html/2607.00454#S2.SS1.p1.1)\.
- \[17\]S\. Yao, D\. Yu, J\. Zhao, I\. Shafran, T\. Griffiths, Y\. Cao, and K\. Narasimhan\(2023\)Tree of thoughts: deliberate problem solving with large language models\.Advances in neural information processing systems36,pp\. 11809–11822\.Cited by:[§III\-B](https://arxiv.org/html/2607.00454#S3.SS2.p7.1)\.

Similar Articles

SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection

arXiv cs.AI

Introduces SAGE, the first end-to-end LLM-driven multi-agent framework for fraud detection, using a Data Diagnostic Tree and Markov decision process with natural-language gradients to optimize models under class imbalance. Experiments show significant F1 improvements over baselines across five datasets.

HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation

arXiv cs.CL

This paper introduces HawkesLLM, a framework that models semantic uncertainty propagation in multi-step agentic text simulations by combining a multivariate Hawkes process for temporal influence and memory selection with a language model for text generation. Evaluation on a GDELT news-cascade case study shows improved late-stage semantic alignment under compact prompt-memory constraints.