When Agents Meet Electric Bus Fleet Operations: Pricing Behavior, Trade-offs, and Policy Implications in an Aggregator Framework

arXiv cs.AI Papers

Summary

This paper proposes an agentic aggregator framework for coordinating electric bus fleet operations, integrating optimization-based scheduling with supervisory AI agents to handle disturbances, tariff adaptation, and value allocation, revealing trade-offs between operational efficiency and profit-oriented pricing.

arXiv:2606.26400v1 Announce Type: new Abstract: Agentic systems are changing how complex operational tasks are coordinated, introducing a new paradigm for connecting heterogeneous data sources and automating processes. Electric bus fleets provide a relevant test case. Their operation requires continuous coordination between service reliability, battery state-of-charge, charger availability, electricity prices, route-energy uncertainty, and vehicle-to-grid (V2G) opportunities. This paper proposes an agentic aggregator framework that streamlines this decision environment by coupling an optimization-based electric bus scheduling model with supervisory agents for disturbance detection, tariff adaptation, and schedule evaluation. The optimization core enforces physical feasibility across routes, chargers, batteries, and V2G exchanges, while the agentic layer interprets changing operating conditions, triggers real-time re-optimization when needed, and defines how flexibility value is allocated between the aggregator and the public transport operator (PTO). A realistic depot case study evaluates day-ahead and real-time operations under profit-based and operation-based coordination modes, considering service delays, route-energy deviations, electricity price shocks, and combined disturbances. The results show that agentic aggregation can support adaptive fleet-grid coordination by maintaining feasible schedules, activating re-optimization selectively, and improving the use of charging and V2G flexibility. However, they also reveal a critical trade-off: the same agentic capability that reduces operational complexity can extract value from the PTO when configured around profit-oriented pricing. These findings suggest that agentic aggregators can become useful for managing electric bus V2G operations, but their deployment in public-fleet contexts requires transparent coordination modes, auditable tariff-setting, and explicit value-sharing rules.
Original Article
View Cached Full Text

Cached at: 06/26/26, 05:12 AM

# When Agents Meet Electric Bus Fleet Operations: Pricing Behavior, Trade-offs, and Policy Implications in an Aggregator Framework
Source: [https://arxiv.org/html/2606.26400](https://arxiv.org/html/2606.26400)
Jônatas Augusto Manzolli1,2,∗Ali Eslami1,∗Luis Miranda\-Moreno1Jiangbo Yu1,† 1Department of Civil Engineering, McGill University, Montreal, Quebec, Canada 2INESC Coimbra, University of Coimbra, Coimbra, Portugal ∗These authors contributed equally to this work\.†Corresponding author:jiangbo\.yu@mcgill\.ca

###### Abstract

Agentic systems are changing how complex operational tasks are coordinated by connecting heterogeneous data sources, introducing a new paradigm for simplifying tasks, connecting datasets, and automating processes\. Electric bus fleets provide a relevant test case for this paradigm\. Their operation requires continuous coordination between service reliability, battery state\-of\-charge, charger availability, electricity prices, route\-energy uncertainty, and vehicle\-to\-grid \(V2G\) opportunities\. This paper proposes an agentic aggregator framework that streamlines this decision environment by coupling an optimization\-based electric bus scheduling model with supervisory agents for disturbance detection, tariff adaptation, and schedule evaluation\. The optimization core enforces physical feasibility across routes, chargers, batteries, and V2G exchanges, while the agentic layer interprets changing operating conditions, triggers real\-time re\-optimization when needed, and defines how flexibility value is allocated between the aggregator and the public transport operator \(PTO\)\. A realistic depot case study evaluates day\-ahead and real\-time operations under profit\-based and operation\-based coordination modes, considering service delays, route\-energy deviations, electricity price shocks, and combined disturbances\. The results show that agentic aggregation can support adaptive fleet\-grid coordination by maintaining feasible schedules, activating re\-optimization selectively, and improving the use of charging and V2G flexibility\. However, they also reveal a critical trade\-off: the same agentic capability that reduces operational complexity can extract value from the PTO when configured around profit\-oriented pricing\. Prompt\-sensitivity experiments further show that agentic pricing behavior changes with prompt configuration, making prompt design a reproducibility and governance variable\. These findings suggest that agentic aggregators can become useful tools for managing electric bus V2G operations, but their deployment in public\-fleet contexts requires transparent coordination modes, auditable tariff\-setting, and explicit value\-sharing rules\.

Keywords:Electric bus fleets; Aggregators; Smart charging; Vehicle\-to\-grid; Energy market; Agentic AI

## 1Introduction

Artificial intelligence \(AI\) is rapidly moving from passive decision support toward agentic systems capable of coordinating tasks, connecting heterogeneous data sources, invoking external tools, and automating structured workflows\. This evolution is particularly relevant for energy and transport systems, where operational decisions depend on fragmented information, physical constraints, market signals, and real\-time disturbances\. In such contexts, agentic AI should not be understood as a replacement for mathematical optimization or engineering models, but as a supervisory layer that can streamline the complex workflow around them: detecting when conditions change, preparing decision inputs, selecting the appropriate analytical tool, and interpreting the resulting outputs\.

Electric bus fleets provide a timely test case for this capability\. As public transport electrification expands, electric buses are becoming large, time\-varying loads connected to the power system\[[17](https://arxiv.org/html/2606.26400#bib.bib1)\]\. Their operation is already complex because public transport operators \(PTOs\) must coordinate battery state\-of\-charge \(SOC\), charger availability, route assignments, timetable reliability, charging duration, route\-energy uncertainty, and depot power limits\[[21](https://arxiv.org/html/2606.26400#bib.bib26)\]\. This complexity increases further when vehicle\-to\-grid \(V2G\) operation and energy trading are introduced\. The fleet must then decide not only when to charge, but also when to discharge, how much energy to export, and under which tariff or compensation structure participation remains attractive\. Previous optimization\-based studies have shown that coordinated charging, tariff\-aware scheduling, and V2G strategies can reduce costs and improve system performance, especially when battery aging, demand peaks, and uncertainty are explicitly considered\[[24](https://arxiv.org/html/2606.26400#bib.bib8),[29](https://arxiv.org/html/2606.26400#bib.bib5)\]\. However, deploying these strategies in practice remains challenging because the required information changes continuously during operation\.

In this setting, the aggregator becomes a natural coordination entity\[[15](https://arxiv.org/html/2606.26400#bib.bib29)\]\. In electricity systems, aggregators pool distributed flexibility, interact with markets, and translate system\-level signals into coordinated decisions across users and assets\[[5](https://arxiv.org/html/2606.26400#bib.bib6)\]\. For electric bus fleets, the aggregator can be understood as an intermediary between grid needs and fleet operations\. It receives market and grid signals, converts them into fleet\-feasible charging and V2G actions, and defines how the economic value of flexibility is shared between the aggregator and the PTO\. This role is broader than simple arbitrage because the value of flexibility cannot be separated from service reliability, terminal SOC, charger access, battery feasibility, and the willingness of the PTO to participate\.

Existing studies have modeled aggregator–fleet interaction through hierarchical or market\-oriented formulations, including leader–follower structures in which an aggregator defines trading conditions and the fleet operator responds with charging and discharging decisions\[[4](https://arxiv.org/html/2606.26400#bib.bib34),[1](https://arxiv.org/html/2606.26400#bib.bib35),[22](https://arxiv.org/html/2606.26400#bib.bib36)\]\. These models provide an important mathematical basis for aggregator\-supported electric bus operation\. However, they are usually built around pre\-structured inputs, fixed decision horizons, and predefined optimization calls\. As a result, they are useful for planning and scenario analysis, but less suited to real\-time environments where delays, route\-energy deviations, vehicle\-state uncertainty, and electricity\-price changes can rapidly make a previous schedule outdated\. Existing electric bus charging studies offer powerful formulations for cost minimization, infrastructure use, and fleet scheduling, but they place less emphasis on the orchestration layer that decides when and how these models should be used in operation\[[31](https://arxiv.org/html/2606.26400#bib.bib30),[39](https://arxiv.org/html/2606.26400#bib.bib31)\]\.

Agentic systems can address this gap by being embedded within the aggregator\. Instead of treating the aggregator only as a tariff\-setting optimization problem, an agentic aggregator can operate as a structured supervisory system around the charging optimizer\. A Trigger Agent can monitor operational and market deviations and decide whether re\-optimization is needed\. A Pricing Agent can translate the selected coordination strategy into buy and sell multipliers for charging and V2G exchange\. An Evaluator Agent can assess whether the resulting schedule is acceptable from economic and operational perspectives\. In this architecture, the optimizer remains responsible for enforcing feasibility, while the agentic layer simplifies the surrounding decision workflow\. The potential benefit is operational and economic: the system can coordinate charging and V2G events more adaptively, reduce manual intervention, and allocate flexibility value between the aggregator and the PTO\.

At the same time, this capability creates a critical concern\. If an agentic aggregator controls pricing guidance, its behavior determines who captures the value of V2G flexibility\. The same system that can streamline electric bus operation and protect the PTO can also be configured to extract value from it through profit\-oriented tariff behavior\. This concern is particularly important because agentic behavior may depend on prompt configuration and coordination\-mode instructions, which are not necessarily visible to the PTO or easy for regulators to audit\. Therefore, the central question is not only whether agentic systems can operate electric bus V2G, but also how their behavior affects coordination, pricing, and value allocation\.

Despite the rapid emergence of agentic AI in energy and infrastructure applications\[[36](https://arxiv.org/html/2606.26400#bib.bib42),[35](https://arxiv.org/html/2606.26400#bib.bib41),[11](https://arxiv.org/html/2606.26400#bib.bib33),[26](https://arxiv.org/html/2606.26400#bib.bib32)\], the literature still lacks a concrete evaluation of an agentic aggregator for electric bus fleet operation\. In particular, there is limited evidence on how such a system should be structured, how it should interact with an optimization\-based charging model, how it behaves under day\-ahead \(DA\) and real\-time \(RT\) operation, and how different pricing behaviors affect both the aggregator and the PTO\. There is also little quantitative evidence on whether agentic pricing configurations can shift value from public fleet operators to aggregators under otherwise feasible operating schedules\.

Motivated by this gap, this paper proposes an agentic aggregator framework for electric bus charging and V2G coordination\. The framework combines an optimization\-based PTO scheduling model with three supervisory agents for triggering, pricing, and schedule evaluation\. The system is tested under DA and RT conditions, including service\-timing disturbances, route\-energy deviations, electricity\-price shocks, and combined stress cases\. Two coordination modes are compared: a profit\-based mode that prioritizes aggregator revenue and an operational\-based mode that prioritizes PTO\-compatible flexibility participation\. The main contributions of this paper are:

- •A traceable agentic\-AI architecture for adaptive EV aggregation that combines optimization with Trigger, Pricing, and Evaluator Agents into a unified pipeline separating disturbance detection, tariff adaptation, and schedule acceptance\.
- •A controlled DA and RT evaluation comparing profit\-based and operational\-based aggregator coordination modes, showing how agentic pricing reallocates flexibility value between the aggregator and the PTO across delay, energy\-consumption, price, and combined disturbances\.
- •An applied assessment of agentic aggregator pricing schemes and prompt configurations, deriving policy and market\-design implications for tariff transparency, value\-sharing, and PTO\-compatible participation in V2G flexibility markets\.

The remainder of this paper is organized as follows\. Section[2](https://arxiv.org/html/2606.26400#S2)reviews the relevant literature on electric bus charging, aggregation mechanisms, and emerging agentic AI applications in energy systems\. Section[3](https://arxiv.org/html/2606.26400#S3)presents the proposed methodology, including the framework architecture, agentic supervisory logic, and optimization interface\. Section[4](https://arxiv.org/html/2606.26400#S4)describes the case study used to instantiate the framework\. Section[5](https://arxiv.org/html/2606.26400#S5)reports and discusses the main results, including DA operation, RT disturbances, pricing behavior, and prompt sensitivity\. Section[6](https://arxiv.org/html/2606.26400#S6)concludes the paper, discusses policy implications, and outlines directions for future research\.

## 2Literature Review

The transition toward electric bus systems represents a fundamental shift in public transport, extending beyond vehicle electrification to the coordinated management of fleets, infrastructure, and energy systems\. As electrification scales, operational decisions become increasingly coupled with electricity markets, grid constraints, and multi\-actor interactions, requiring integrated approaches that combine optimization, economic coordination, and system\-level intelligence\. This literature review examines three converging research streams: electric bus optimization\-based charging and pricing strategies, fleet–grid interaction through aggregation mechanisms, and emerging agentic AI frameworks for system orchestration\.

### 2\.1Optimization\-Based Charging and Pricing Strategies

Optimization remains the dominant analytical paradigm for electric bus charging management\. A large part of the literature focuses on scheduling charging activities under battery constraints, charging windows, electricity tariffs, and infrastructure limits\. Representative examples include integrated models for charger deployment and fleet scheduling\[[33](https://arxiv.org/html/2606.26400#bib.bib4)\], general charging scheduling formulations under time\-of\-use tariffs and station capacity constraints\[[3](https://arxiv.org/html/2606.26400#bib.bib13)\], and facility planning approaches that explicitly consider uncertainty in travel times, battery degradation, and charger technology choices\[[41](https://arxiv.org/html/2606.26400#bib.bib14)\]\. Other studies extend the operational scope by incorporating seasonality, power matching, and infrastructure design\[[19](https://arxiv.org/html/2606.26400#bib.bib15)\], by jointly optimizing fleet composition and scheduling across multiple depots and charging technologies\[[34](https://arxiv.org/html/2606.26400#bib.bib20)\], or by handling large network\-scale charging schedules through branch\-and\-price and adaptive neighborhood search methods\[[40](https://arxiv.org/html/2606.26400#bib.bib21)\]\. Recent work also shows growing interest in scalable heuristics for real\-world agencies, with applications to fleet electrification and charger allocation under time\-varying electricity prices\[[27](https://arxiv.org/html/2606.26400#bib.bib24)\]\. Another important stream emphasizes operational realism beyond the minimization of daily charging costs\. Battery degradation has been incorporated through both dynamic programming approaches that match workloads to battery aging\[[32](https://arxiv.org/html/2606.26400#bib.bib22)\]and formulations that embed degradation directly into network\-scale scheduling\[[25](https://arxiv.org/html/2606.26400#bib.bib40)\]\. Related studies also explore coupling bus charging with local energy resources or grid services\. For example, battery charging and discharging, demand response, and renewable integration have been examined in public transport settings\[[18](https://arxiv.org/html/2606.26400#bib.bib23)\], while photovoltaic\-storage\-charging coordination has been proposed to reduce external grid purchases and improve local energy autonomy\[[16](https://arxiv.org/html/2606.26400#bib.bib16)\]\.

### 2\.2Electric Bus Aggregation and Fleet–Grid Interaction

Lately, research on electric bus systems has increasingly shifted from vehicle\-level feasibility questions to fleet\-level coordination with charging infrastructure and electricity markets\. Early studies established that the operational viability of electric buses depends strongly on route characteristics, charging opportunities, infrastructure siting, and realistic estimates of energy consumption\[[28](https://arxiv.org/html/2606.26400#bib.bib2),[14](https://arxiv.org/html/2606.26400#bib.bib3)\]\. More recent review work confirms that electric bus scheduling has evolved into a multi\-layer problem involving vehicle assignment, charging management, infrastructure design, and robustness considerations rather than a simple charging\-timing exercise\[[38](https://arxiv.org/html/2606.26400#bib.bib19)\]\. As the scale of electrification increases, these issues become inseparable from grid interaction, since concentrated depot charging and opportunity charging can create large, time\-varying loads with direct implications for peak demand, flexibility provision, and local network planning\[[13](https://arxiv.org/html/2606.26400#bib.bib37)\]\. In this broader context, the aggregator serves as a useful intermediary, consolidating flexibility, translating market signals, and coordinating assets that would otherwise be difficult for a PTO to manage directly\[[7](https://arxiv.org/html/2606.26400#bib.bib11)\]\. For example, Cao et al\.\[[6](https://arxiv.org/html/2606.26400#bib.bib39)\]developed a robust optimization model for scheduling an EV aggregator under upstream market price uncertainty, showing that different charging and discharging strategies can protect the aggregator’s profits across optimistic, deterministic, and pessimistic price scenarios\. Clairand et al\.\[[9](https://arxiv.org/html/2606.26400#bib.bib12)\]showed that accounting for the aggregator in charging station planning can reduce energy costs while respecting grid constraints\. Chen and Strunz\[[8](https://arxiv.org/html/2606.26400#bib.bib38)\]proposed an aggregator\-based framework for coordinating multi\-area, multifunctional electric bus charging stations that support normal charging, fast charging, and battery swapping, while integrating renewable energy procurement and frequency\-control ancillary services to reduce operating costs\. More recently, the framework of Manzolli et al\.\[[23](https://arxiv.org/html/2606.26400#bib.bib7)\]explicitly modeled the interaction between an aggregator and a PTO through a hierarchical optimization structure\. In that formulation, the aggregator participates in energy trading and defines pricing signals, while the PTO schedules charging and discharging activities subject to service requirements\. Taken together, these studies support the idea that fleet\-grid interaction is not only a charging problem, but also an intermediation and coordination problem in which the aggregator plays a central role\. However, these studies usually assume that the relevant data are already available and that the decision to rerun the optimization is externally determined rather than reasoned about within the framework itself\. Moreover, these studies treat the aggregator’s pricing role as a technical design choice rather than a source of economic risk to the PTO, leaving the regulatory implications of aggregator tariff\-setting power largely unexamined\.

### 2\.3Agentic Orchestration for Energy Systems

Recent advances in agentic AI, enabled largely by large language models \(LLMs\), have opened a new line of inquiry into their roles in energy and cyber\-physical systems\. Existing studies generally do not argue that these models should replace formal optimization or physics\-based models\. Instead, they suggest that they may be useful as semantic and decision\-support layers that can interpret heterogeneous information, support operator interaction, orchestrate tools, and improve monitoring workflows\. Majumder et al\.\[[20](https://arxiv.org/html/2606.26400#bib.bib9)\]discuss both the promise and the limitations of LLMs in the electric energy sector, emphasizing their usefulness for knowledge\-intensive tasks while also warning about hallucinations, reliability, and domain grounding\. Zhang et al\.\[[37](https://arxiv.org/html/2606.26400#bib.bib18)\]similarly frame language\-enabled AI as an enabling technology for information integration, decision support, and human–AI collaboration across energy applications\. Emerging review work on smart grids also highlights the relevance of agentic AI to grid operations, market analysis, security support, and adaptive energy services when deployed within carefully designed supervisory architectures\[[30](https://arxiv.org/html/2606.26400#bib.bib10),[2](https://arxiv.org/html/2606.26400#bib.bib25)\]\. Although this literature is still young, a common message is already emerging: the comparative advantage of agentic AI lies less in directly solving constrained optimization problems and more in orchestrating complex information flows around specialized analytical tools\. This is especially relevant for energy systems where data are fragmented across operational databases, market signals, textual procedures, alarms, and engineering models\. For EV aggregation, such capabilities could support state interpretation, pricing\-policy enforcement, and the triggering of optimization runs under changing system conditions\. However, concrete frameworks that connect multi\-agent supervisory reasoning to electric bus fleet optimization remain scarce in the peer\-reviewed literature\.

### 2\.4Research Gap

The reviewed literature reveals five main observations\. First, studies on electric buses have already demonstrated the importance of fleet\-aware charging coordination, fleet planning, and realistic operational constraints\. Second, aggregator\-oriented research has shown that intermediary entities can improve the economic and operational integration of flexible loads with electricity markets\. Third, related optimization work has advanced considerably in areas such as uncertainty modeling, infrastructure design, battery degradation, and market\-responsive scheduling\. Fourth, the emerging AI\-for\-energy literature suggests that agentic supervisory systems may be useful for orchestration, interpretation, and decision support, but it remains mostly conceptual or broadly scoped at the grid level\. Fifth, neither the aggregator\-fleet literature nor the emerging agentic AI literature has examined the regulatory implications of deploying AI\-driven pricing agents in energy markets, systems whose pricing aggressiveness can be reconfigured through prompt design alone, without any structural change visible to regulators or PTOs\.

What remains is a framework that integrates these strands into a unified operational architecture\. In particular, the literature still lacks an approach that simultaneously integrates: i\) aggregator\-fleet optimization grounded in electric bus operational constraints; ii\) context\-aware orchestration of heterogeneous fleet and market information; iii\) event\-triggered re\-optimization based on deviations, disturbances, or opportunity signals; and iv\) coordination\-mode reasoning in the pricing layer, such as contrasting operational\-based and profit\-based aggregator strategies\. The literature also lacks any quantitative assessment of how aggregator coordination mode and prompt configuration affect PTO economic exposure, and what regulatory measures this implies\. This paper addresses these gaps by embedding a multi\-agent supervisory layer into the aggregator setting, while preserving the optimization engine as the constraint\-enforcing core of the decision process, and by drawing explicit policy implications from the quantitative comparison of aggregator coordination modes\.

## 3Methodology

The proposed methodology is organized around the separation between agentic supervision and optimization\-based feasibility enforcement\. Section[3\.1](https://arxiv.org/html/2606.26400#S3.SS1)introduces the overall framework architecture and explains how the fleet, grid, and agentic aggregator layer interact\. Section[3\.2](https://arxiv.org/html/2606.26400#S3.SS2)details the agentic aggregator framework, including the supervisory agents, coordination modes, and event\-triggered re\-optimization logic\. Section[3\.3](https://arxiv.org/html/2606.26400#S3.SS3)then presents the compact PTO scheduling and optimization interface governed by the agentic layer\.[Appendix B](https://arxiv.org/html/2606.26400#A2)provides the full optimization formulation, the real\-time re\-optimization updates, and the formal supervisory flow that connects the agentic layer to the optimizer\.

### 3\.1Overall Framework

As illustrated in Figure[1](https://arxiv.org/html/2606.26400#S3.F1), the proposed framework is organized around three coupled components: the fleet and charging system, the grid and market environment, and the multi\-agentic aggregator layer that coordinates the interaction between them\.

![Refer to caption](https://arxiv.org/html/2606.26400v1/figures/1.png)Figure 1:Proposed framework for electric bus fleet–grid interaction coordinated by the agentic aggregator\.On the fleet side, the framework receives operational information, including bus state\-of\-charge \(SOC\), battery limits, charger availability, trip assignments, departure and arrival times, and disturbance information related to timetable delays and route\-energy deviations\. These data define the physical feasibility region of the problem, since charging and discharging decisions must preserve transport service while respecting battery and infrastructure constraints\. On the grid and market side, the framework receives electricity prices, power limits, and V2G flexibility opportunities\. These two information streams differ in format, timescale, and reliability: fleet states may change during service execution, whereas market signals may vary at hourly or sub\-hourly intervals\.

The aggregator layer acts as the coordination interface between the fleet and the grid\. In the day\-ahead \(DA\) workflow, the Pricing Agent translates the selected coordination mode into structured pricing guidance, and the Evaluator Agent assesses the resulting schedule against economic and operational criteria\. In the real\-time \(RT\) workflow, the Trigger Agent monitors deviations from the DA reference and decides whether operating conditions justify re\-optimization\. If re\-optimization is triggered, the Pricing Agent and Evaluator Agent are invoked again over the remaining horizon\.

This architecture preserves the main strengths of optimization\-based fleet scheduling, namely explicit constraints, reproducible decisions, and feasibility guarantees, while adding a contextual reasoning layer for event\-driven operation\. The optimization engine remains the decision core for computing feasible charging and discharging schedules\. The multi\-agentic layer does not replace this optimizer; rather, it acts as a supervisory mechanism that determines when the optimizer should be called, what pricing posture should be used, and whether the resulting schedule should be accepted\.

### 3\.2Agentic Aggregator Framework

In the proposed framework, the aggregator’s pricing and coordination function is performed by specialized supervisory agents rather than by a direct optimization over tariff variables\. The agents produce structured decisions around a constrained PTO scheduling interface, which is summarized in Section[3\.3](https://arxiv.org/html/2606.26400#S3.SS3)and fully stated in[Appendix B](https://arxiv.org/html/2606.26400#A2)\. This separation is important: the agents govern pricing guidance, triggering, and schedule acceptance, whereas the optimizer enforces the operational constraints of the bus fleet, chargers, batteries, and V2G exchanges\.

At the agentic level, the price\-guidance vector𝐮τ\\mathbf\{u\}\_\{\\tau\}contains bounded buy and sell multipliers for each tariff period\. Given this guidance and the current fleet–grid context, the optimizer returns a candidate schedule𝐬τ∗\\mathbf\{s\}\_\{\\tau\}^\{\*\}and the associated KPIs, including PTO cost, aggregator revenue, V2G activity, feasibility status, and terminal SOC\. The tariff mapping and the compact PTO optimization model are introduced in Section[3\.3](https://arxiv.org/html/2606.26400#S3.SS3)\.

As shown in Figure[2](https://arxiv.org/html/2606.26400#S3.F2), the implementation is divided into two event\-driven workflows\. The DA workflow creates the daily reference plan, while the RT workflow repeatedly evaluates whether that reference remains valid under updated market and fleet conditions\.

![Refer to caption](https://arxiv.org/html/2606.26400v1/x1.png)\(a\)DA architecture\.
![Refer to caption](https://arxiv.org/html/2606.26400v1/x2.png)\(b\)RT architecture\.

Figure 2:Implemented DA and RT workflow architectures for the multi\-agentic aggregator\.Table[1](https://arxiv.org/html/2606.26400#S3.T1)summarizes the three supervisory agents used in the implemented workflows\. The implementation avoids unconstrained free\-text prompting by requiring domain\-bounded instructions and structured outputs that can be validated before they affect the optimization engine\.

Table 1:Agent roles in the implemented workflows\.AgentMain prompt contentExpected structured outputTrigger AgentCurrent context, DA plan summary, disturbance indicators, price deviation, energy\-consumption deviation, service\-delay indicators, and feasibility risk flagsTrigger decisionδτ∈\{skip,optimize\}\\delta\_\{\\tau\}\\in\\\{\\texttt\{skip\},\\texttt\{optimize\}\\\}with rationale and confidence scorePricing AgentCoordination mode, market conditions, average prices, recent outcomes, best\-known multipliers, and pricing structures for aggregator–PTO tradingPrice\-guidance vector𝐮τ\\mathbf\{u\}\_\{\\tau\}containing bounded buy/sell multipliers per tariff periodEvaluator AgentOptimization result, feasibility status, PTO cost, aggregator revenue, V2G activity, disturbance indicators, rerun count, and acceptance criteriaAccept or rerun decisionaτa\_\{\\tau\}, rationaleℓτ\\ell\_\{\\tau\}, confidenceγτ\\gamma\_\{\\tau\}, and adjusted guidance request

The agent outputs are restricted to structured actions such asoptimize,skip, orrerun, bounded price\-guidance vectors, confidence levels, and acceptance decisions\. These restrictions are essential in a cyber\-physical setting because they prevent the agentic aggregator from acting as an unconstrained controller\. The framework is therefore best understood as a two\-timescale orchestration layer that adds context awareness, adaptive triggering, and economic guidance around a formal scheduling model\. The architecture was implemented as two complementary event\-driven workflows usingn8nas the orchestration environment\. Persistent state variables encoding the last accepted decision, historical outcomes, and coordination\-mode configuration are maintained across timesteps\.

#### 3\.2\.1Day\-Ahead and Real\-Time Workflows

The DA workflow constructs the daily reference plan\. The Pricing Agent first proposes buy and sell multipliers for each tariff period according to the selected coordination mode\. These multipliers are converted into PTO\-facing buy and sell tariffs by the optimization interface described in Section[3\.3](https://arxiv.org/html/2606.26400#S3.SS3)\. The optimizer then produces a fleet charging and V2G schedule, and the Evaluator Agent either accepts the result or requests a bounded revision with updated guidance\. This loop repeats up toRmaxR\_\{\\max\}times, and the best accepted plan is exported as the DA reference schedule𝐬DA∗\\mathbf\{s\}\_\{\\mathrm\{DA\}\}^\{\*\}\. Algorithm[1](https://arxiv.org/html/2606.26400#alg1)details this sequence\.

Algorithm 1DA planning workflow1:Fetch fleet, charger, trip, disturbance, and market data for day

dd
2:Select coordination mode

m∈\{operational\-based,profit\-based\}m\\in\\\{\\text\{operational\-based\},\\text\{profit\-based\}\\\}
3:Build decision context

𝐜dDA=Ψ​\(𝐟d,𝐠d,𝐡d,m\)\\mathbf\{c\}\_\{d\}^\{\\mathrm\{DA\}\}=\\Psi\(\\mathbf\{f\}\_\{d\},\\mathbf\{g\}\_\{d\},\\mathbf\{h\}\_\{d\},m\), loading historical memory

𝐡d\\mathbf\{h\}\_\{d\}from prior accepted outcomes

4:for

r=0,…,Rmaxr=0,\\ldots,R\_\{\\max\}do

5:Pricing Agent proposes

𝐮d,rDA=\{αp,r\+,αp,r−\}p∈𝒫\\mathbf\{u\}\_\{d,r\}^\{\\mathrm\{DA\}\}=\\\{\\alpha\_\{p,r\}^\{\+\},\\alpha\_\{p,r\}^\{\-\}\\\}\_\{p\\in\\mathcal\{P\}\}
6:Convert

𝐮d,rDA\\mathbf\{u\}\_\{d,r\}^\{\\mathrm\{DA\}\}into tariffs

\{ρp,r\+,ρp,r−\}p∈𝒫\\\{\\rho\_\{p,r\}^\{\+\},\\rho\_\{p,r\}^\{\-\}\\\}\_\{p\\in\\mathcal\{P\}\}
7:Solve PTO optimization and return schedule

𝐬d,r∗\\mathbf\{s\}\_\{d,r\}^\{\*\}
8:Compute KPIs: feasibility, PTO cost, aggregator revenue, V2G energy, and terminal SOC

9:Evaluator Agent returns

\(ad,r,ℓd,r,γd,r\)\(a\_\{d,r\},\\ell\_\{d,r\},\\gamma\_\{d,r\}\)
10:if

ad,r=accepta\_\{d,r\}=\\texttt\{accept\}or

r=Rmaxr=R\_\{\\max\}then

11:Save

𝐬d,r∗\\mathbf\{s\}\_\{d,r\}^\{\*\}, KPIs, tariffs, and agent rationale

12:Export accepted DA plan as

𝐬DA∗\\mathbf\{s\}\_\{\\mathrm\{DA\}\}^\{\*\}
13:break

14:else

15:Update pricing memory and prepare constrained rerun guidance

16:endif

17:endfor

The RT workflow updates this plan only when operating conditions justify intervention\. At each 30\-minute interval, the Trigger Agent compares the current fleet and market state against the active reference schedule\. If no material deviation is detected, the previous schedule is retained\. If re\-optimization is triggered, the Pricing Agent and Evaluator Agent are invoked under the same logic as in the DA workflow, but the optimizer runs only over the remaining horizon from the current timestep\. The updated optimization uses observed SOC, service state, electricity prices, and revised route\-energy requirements\. Algorithm[2](https://arxiv.org/html/2606.26400#alg2)details this sequence\.

Algorithm 2RT adaptation workflow1:Load last accepted plan

𝐬τ−1∗\\mathbf\{s\}^\{\*\}\_\{\\tau\-1\}and DA reference

𝐬DA∗\\mathbf\{s\}^\{\*\}\_\{\\mathrm\{DA\}\}
2:Fetch current fleet, charger, disturbance, and market data

3:Build current context

𝐜τRT=Ψ​\(𝐟τ,𝐠τ,𝐡τ,m\)\\mathbf\{c\}\_\{\\tau\}^\{\\mathrm\{RT\}\}=\\Psi\(\\mathbf\{f\}\_\{\\tau\},\\mathbf\{g\}\_\{\\tau\},\\mathbf\{h\}\_\{\\tau\},m\)
4:Trigger Agent evaluates

𝒯τ\\mathcal\{T\}\_\{\\tau\}and returns

δτ\\delta\_\{\\tau\}
5:if

δτ=skip\\delta\_\{\\tau\}=\\texttt\{skip\}then

6:Set

𝐬τ∗←𝐬τ−1∗\\mathbf\{s\}^\{\*\}\_\{\\tau\}\\leftarrow\\mathbf\{s\}^\{\*\}\_\{\\tau\-1\}and log trigger rationale

7:else

8:for

r=0,…,Rmaxr=0,\\ldots,R\_\{\\max\}do

9:Pricing Agent proposes updated guidance

𝐮τ,rRT\\mathbf\{u\}\_\{\\tau,r\}^\{\\mathrm\{RT\}\}
10:Solve PTO optimization from the current state and return

𝐬τ,r∗\\mathbf\{s\}\_\{\\tau,r\}^\{\*\}
11:Compute RT KPIs and deviation from

𝐬DA∗\\mathbf\{s\}^\{\*\}\_\{\\mathrm\{DA\}\}
12:Evaluator Agent returns

\(aτ,r,ℓτ,r,γτ,r\)\(a\_\{\\tau,r\},\\ell\_\{\\tau,r\},\\gamma\_\{\\tau,r\}\)
13:if

aτ,r=accepta\_\{\\tau,r\}=\\texttt\{accept\}or

r=Rmaxr=R\_\{\\max\}then

14:Save

𝐬τ∗←𝐬τ,r∗\\mathbf\{s\}^\{\*\}\_\{\\tau\}\\leftarrow\\mathbf\{s\}\_\{\\tau,r\}^\{\*\}and update memory

15:break

16:else

17:Restrict the next guidance update using evaluator feedback

18:endif

19:endfor

20:endif

21:Use

𝐬τ∗\\mathbf\{s\}^\{\*\}\_\{\\tau\}as the reference at timestep

τ\+1\\tau\+1

The formal mathematical definitions of the agent policies and their connection to the PTO scheduling problem are given in Appendix[B\.3](https://arxiv.org/html/2606.26400#A2.SS3)\.

#### 3\.2\.2Aggregator Coordination Modes: Profit\-based and Operational\-based

The agentic aggregator can be configured according to different coordination modes that define its economic posture when it proposes price guidance and evaluates candidate schedules\. Two modes are considered: profit\-based and operational\-based\.

- •In the*profit\-based*mode, the aggregator behaves as a commercially oriented intermediary\. The Pricing Agent is allowed to propose higher charging\-price multipliers when the PTO buys energy and lower V2G compensation multipliers when the PTO exports energy\. This mode represents an upper\-bound case for per\-fleet value capture, since it reveals how much revenue the aggregator can obtain when market margin is the dominant coordination objective\. It also exposes the associated PTO\-side cost burden, which would need to be justified through contracts, reliability guarantees, or explicit value\-sharing rules\.
- •In the*operational\-based*mode, the aggregator behaves as a fleet\-oriented coordinator\. The objective is not to eliminate aggregator revenue, but to make grid participation compatible with public transport operation\. The Pricing Agent therefore favors lower charging\-price multipliers and higher V2G compensation multipliers, reducing the PTO’s exposure to electricity\-price markups and returning more value to the fleet when it provides flexibility\. This mode reflects an aggregator operated by a public institution, a regulated platform, or a private provider whose business model depends on long\-term fleet participation and portfolio scale\.

For each tariff periodpp, the buy multiplierαp\+\\alpha\_\{p\}^\{\+\}and the sell multiplierαp−\\alpha\_\{p\}^\{\-\}proposed by the Pricing Agent are bounded by mode\-dependent ranges:

α¯m\+≤αp\+≤α¯m\+,α¯m−≤αp−≤α¯m−,∀p∈𝒫,\\underline\{\\alpha\}\_\{m\}^\{\+\}\\leq\\alpha\_\{p\}^\{\+\}\\leq\\overline\{\\alpha\}\_\{m\}^\{\+\},\\qquad\\underline\{\\alpha\}\_\{m\}^\{\-\}\\leq\\alpha\_\{p\}^\{\-\}\\leq\\overline\{\\alpha\}\_\{m\}^\{\-\},\\qquad\\forall p\\in\\mathcal\{P\},\(1\)wheremmdenotes the selected coordination mode\. In the operational\-based mode, these bounds restrict the aggregator to PTO\-compatible pricing ranges\. In the profit\-based mode, they allow wider margins for aggregator value capture\. The bounds therefore make the aggregator’s economic posture explicit while keeping the agent\-generated tariffs within predefined and auditable limits\. The conversion from these multipliers to PTO\-facing tariffs is given in Section[3\.3](https://arxiv.org/html/2606.26400#S3.SS3)\.

After a candidate schedule is produced by the optimizer, the Evaluator Agent assesses whether the result is consistent with the selected mode through a mode\-dependent supervisory score:

Jm​\(𝐮τ,𝐬τ∗\)=λmA​ΠτAgg−λmP​CτPTO−λmR​Rτ,J\_\{m\}\(\\mathbf\{u\}\_\{\\tau\},\\mathbf\{s\}\_\{\\tau\}^\{\*\}\)=\\lambda\_\{m\}^\{A\}\\Pi\_\{\\tau\}^\{\\mathrm\{Agg\}\}\-\\lambda\_\{m\}^\{P\}C\_\{\\tau\}^\{\\mathrm\{PTO\}\}\-\\lambda\_\{m\}^\{R\}R\_\{\\tau\},\(2\)whereΠτAgg\\Pi\_\{\\tau\}^\{\\mathrm\{Agg\}\}is aggregator revenue,CτPTOC\_\{\\tau\}^\{\\mathrm\{PTO\}\}is PTO operating cost, andRτR\_\{\\tau\}is a supervisory operational\-risk score\.111The termRτR\_\{\\tau\}is not an additional constraint in the PTO optimization model\. It is an evaluation metric used by the agentic layer to discourage schedules with weak service robustness, low terminal SOC margins, or unresolved operational warnings\.The revenue and cost terms are defined by the optimization interface in Section[3\.3](https://arxiv.org/html/2606.26400#S3.SS3)and by the formal supervisory flow in Appendix[B\.3](https://arxiv.org/html/2606.26400#A2.SS3)\. The decomposition ofRτR\_\{\\tau\}is given by

Rτ=ωSOC​RτSOC\+ωend​Rτend\+ωserv​Rτserv,R\_\{\\tau\}=\\omega\_\{\\mathrm\{SOC\}\}R\_\{\\tau\}^\{\\mathrm\{SOC\}\}\+\\omega\_\{\\mathrm\{end\}\}R\_\{\\tau\}^\{\\mathrm\{end\}\}\+\\omega\_\{\\mathrm\{serv\}\}R\_\{\\tau\}^\{\\mathrm\{serv\}\},\(3\)whereRτSOCR\_\{\\tau\}^\{\\mathrm\{SOC\}\}captures low\-SOC exposure during the remaining horizon,RτendR\_\{\\tau\}^\{\\mathrm\{end\}\}captures end\-of\-day readiness risk, andRτservR\_\{\\tau\}^\{\\mathrm\{serv\}\}captures service\-feasibility warnings\. In the operational\-based mode,λmP\\lambda\_\{m\}^\{P\}andλmR\\lambda\_\{m\}^\{R\}are larger relative toλmA\\lambda\_\{m\}^\{A\}, so the supervisory layer favors lower PTO cost and more robust schedules\. In the profit\-based mode,λmA\\lambda\_\{m\}^\{A\}is larger, so higher aggregator revenue receives greater weight, provided that feasibility remains acceptable\.

#### 3\.2\.3Event\-Triggered Re\-Optimization Logic

The event\-triggered logic determines when the RT workflow should rerun the PTO optimization model\. At each RT intervalτ\\tau, the Trigger Agent compares the observed operating context with the assumptions embedded in the active reference plan\. Three disturbance classes are monitored\. The first class includes electricity\-price deviations, which may change the economic value of charging or V2G export\. The second class includes route\-energy deviations, which affect SOC trajectories and may reduce the energy margin available for service or flexibility\. The third class includes service\-timing disturbances, such as early or late arrivals, which modify depot dwell times and charging opportunities\.

These signals are converted into normalized indicators and summarized by a compact trigger score:

𝒯τ=ωp​Δτp\+ωe​Δτe\+ωd​Δτd,\\mathcal\{T\}\_\{\\tau\}=\\omega\_\{p\}\\Delta^\{p\}\_\{\\tau\}\+\\omega\_\{e\}\\Delta^\{e\}\_\{\\tau\}\+\\omega\_\{d\}\\Delta^\{d\}\_\{\\tau\},\(4\)whereΔτp\\Delta^\{p\}\_\{\\tau\}measures the normalized price deviation,Δτe\\Delta^\{e\}\_\{\\tau\}measures the normalized route\-energy deviation, andΔτd\\Delta^\{d\}\_\{\\tau\}measures service\-delay severity\. The weightsωp\\omega\_\{p\},ωe\\omega\_\{e\}, andωd\\omega\_\{d\}reflect the relative importance of market, energy, and service\-timing deviations in the triggering logic\. A new RT optimization run is activated when the aggregate trigger score exceeds a predefined threshold or when a hard feasibility warning is detected:

𝒯τ≥ΘorFRτ=1\.\\mathcal\{T\}\_\{\\tau\}\\geq\\Theta\\quad\\text\{or\}\\quad\\mathrm\{FR\}\_\{\\tau\}=1\.\(5\)Here,Θ\\Thetais the trigger threshold andFRτ\\mathrm\{FR\}\_\{\\tau\}is a binary flag identifying cases in which the current reference plan should not continue without intervention\. This flag is defined as

FRτ=𝕀\[∃k∈𝒦:ek,τobs<Ekmin∨ek,Tpred<Ekend∨TRk,τ=1\],\\begin\{split\}\\mathrm\{FR\}\_\{\\tau\}&=\\mathbb\{I\}\\big\[\\exists k\\in\\mathcal\{K\}:e\_\{k,\\tau\}^\{\\mathrm\{obs\}\}<E\_\{k\}^\{\\min\}\\ \\lor\\ e\_\{k,T\}^\{\\mathrm\{pred\}\}<E\_\{k\}^\{\\mathrm\{end\}\}\\\\ &\\hskip 108\.12054pt\\lor\\ \\mathrm\{TR\}\_\{k,\\tau\}=1\\big\],\\end\{split\}\(6\)where𝕀​\[⋅\]\\mathbb\{I\}\[\\cdot\]is an indicator function,ek,τobse\_\{k,\\tau\}^\{\\mathrm\{obs\}\}is the observed stored energy of buskkat intervalτ\\tau,ek,Tprede\_\{k,T\}^\{\\mathrm\{pred\}\}is the terminal stored energy expected if the current reference schedule is maintained, andTRk,τ\\mathrm\{TR\}\_\{k,\\tau\}indicates whether buskkis at risk of missing an active or upcoming service assignment\.

If neither condition in Eq\. \([5](https://arxiv.org/html/2606.26400#S3.E5)\) is satisfied, the RT workflow keeps the current reference schedule and no optimization is solved\. If the trigger condition is satisfied, the PTO optimization model is rerun from the current interval to the end of the operating day using the observed SOC, service state, updated electricity prices, and observed route\-level energy requirements\. The accepted RT solution then becomes the new reference schedule for subsequent intervals\. Figure[3](https://arxiv.org/html/2606.26400#S4.F3)exemplifies this dynamic\.

### 3\.3PTO Scheduling and Optimization Interface

The agents described above govern a public transport operator \(PTO\) charging\-scheduling model\. This subsection states the compact optimization interface used in the body of the paper\. The complete constraint set, the RT re\-optimization variant, and the formal connection between the supervisory layer and the optimizer are provided in[Appendix B](https://arxiv.org/html/2606.26400#A2)\.

Letp∈𝒫p\\in\\mathcal\{P\}index tariff periods,t∈𝒯t\\in\\mathcal\{T\}time intervals,k∈𝒦k\\in\\mathcal\{K\}buses,n∈𝒩n\\in\\mathcal\{N\}chargers, andi∈ℐi\\in\\mathcal\{I\}routes\. The PTO decides route assignmentsbk,i,tb\_\{k,i,t\}, chargingxk,n,tx\_\{k,n,t\}, V2G dischargingyk,n,ty\_\{k,n,t\}, stored energyek,te\_\{k,t\}, and the aggregate depot power exchanged with the grid,wt\+w\_\{t\}^\{\+\}for import andwt−w\_\{t\}^\{\-\}for export\.

The Pricing Agent proposes period multipliersαp\+\\alpha\_\{p\}^\{\+\}andαp−\\alpha\_\{p\}^\{\-\}, which are mapped into the PTO\-facing buy and sell tariffs by scaling the average grid priceπ¯pg\\bar\{\\pi\}\_\{p\}^\{g\}:

ρp\+=αp\+​π¯pg,ρp−=αp−​π¯pg,∀p∈𝒫\.\\rho\_\{p\}^\{\+\}=\\alpha\_\{p\}^\{\+\}\\,\\bar\{\\pi\}\_\{p\}^\{g\},\\qquad\\rho\_\{p\}^\{\-\}=\\alpha\_\{p\}^\{\-\}\\,\\bar\{\\pi\}\_\{p\}^\{g\},\\qquad\\forall p\\in\\mathcal\{P\}\.\(7\)Given these tariffs, the PTO selects a schedule𝐬∗\\mathbf\{s\}^\{\*\}that minimizes its net energy\-trading cost:

𝐬∗∈arg⁡min𝐬∈𝒮⁡CPTO​\(𝐬\),\\mathbf\{s\}^\{\*\}\\in\\arg\\min\_\{\\mathbf\{s\}\\in\\mathcal\{S\}\}C^\{\\mathrm\{PTO\}\}\(\\mathbf\{s\}\),\(8\)where

CPTO​\(𝐬\)=∑t∈𝒯\[ρp​\(t\)\+​wt\+−ρp​\(t\)−​wt−\]​Δ​t,C^\{\\mathrm\{PTO\}\}\(\\mathbf\{s\}\)=\\sum\_\{t\\in\\mathcal\{T\}\}\\left\[\\rho\_\{p\(t\)\}^\{\+\}w\_\{t\}^\{\+\}\-\\rho\_\{p\(t\)\}^\{\-\}w\_\{t\}^\{\-\}\\right\]\\Delta t,\(9\)and𝒮\\mathcal\{S\}is the feasible set defined by the operational constraints\.

The binding energy dynamics are given by the per\-bus SOC balance:

ek,t\+1=ek,t\+∑n∈𝒩ηnch​Pnch​xk,n,t​Δ​t−∑n∈𝒩1ηndis​Pndis​yk,n,t​Δ​t−∑i∈ℐξi​bk,i,t,\\displaystyle\\begin\{split\}e\_\{k,t\+1\}&=e\_\{k,t\}\+\\sum\_\{n\\in\\mathcal\{N\}\}\\eta\_\{n\}^\{\\mathrm\{ch\}\}P\_\{n\}^\{\\mathrm\{ch\}\}x\_\{k,n,t\}\\Delta t\\\\ &\\quad\-\\sum\_\{n\\in\\mathcal\{N\}\}\\tfrac\{1\}\{\\eta\_\{n\}^\{\\mathrm\{dis\}\}\}P\_\{n\}^\{\\mathrm\{dis\}\}y\_\{k,n,t\}\\Delta t\-\\sum\_\{i\\in\\mathcal\{I\}\}\\xi\_\{i\}b\_\{k,i,t\},\\end\{split\}\(10\)subject to battery limits and terminal\-energy requirements,

Ekmin≤ek,t≤Ekmax,ek,0=Ek0,ek,T≥Ekend\.E\_\{k\}^\{\\min\}\\leq e\_\{k,t\}\\leq E\_\{k\}^\{\\max\},\\qquad e\_\{k,0\}=E\_\{k\}^\{0\},\\qquad e\_\{k,T\}\\geq E\_\{k\}^\{\\mathrm\{end\}\}\.\(11\)The feasible set also includes route–charger exclusivity, so that a bus cannot charge and serve a route in the same interval; per\-charger occupancy constraints; depot\-power aggregation constraints,

wt\+=∑k∈𝒦∑n∈𝒩Pnch​xk,n,t,wt−=∑k∈𝒦∑n∈𝒩Pndis​yk,n,t;w\_\{t\}^\{\+\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\sum\_\{n\\in\\mathcal\{N\}\}P\_\{n\}^\{\\mathrm\{ch\}\}x\_\{k,n,t\},\\qquad w\_\{t\}^\{\-\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\sum\_\{n\\in\\mathcal\{N\}\}P\_\{n\}^\{\\mathrm\{dis\}\}y\_\{k,n,t\};\(12\)and depot exchange limits,

0≤wt\+≤W¯,0≤wt−≤W¯\.0\\leq w\_\{t\}^\{\+\}\\leq\\overline\{W\},\\qquad 0\\leq w\_\{t\}^\{\-\}\\leq\\overline\{W\}\.\(13\)The full mathematical formulation is deferred to[Appendix B](https://arxiv.org/html/2606.26400#A2)to keep the body of the methodology focused on the agentic coordination logic\.

## 4Case Study Description

The case study evaluates the proposed agentic aggregator in a controlled electric\-bus depot setting\. The system includes the main elements required for fleet\-grid coordination: buses, battery limits, depot chargers, route\-service windows, route\-level energy requirements, and time\-varying electricity prices\. The analysis is organized in two stages\. First, the DA evaluation defines the reference schedule under nominal conditions\. Second, the RT evaluation introduces controlled disturbances in service timing, route energy consumption, and electricity prices\. In the following, we describe the bus\-system characteristics and the DA and RT evaluation scenarios, respectively\.

### 4\.1Fleet and Charging Configuration

The fleet and infrastructure configuration is summarized in Table[2](https://arxiv.org/html/2606.26400#S4.T2)\. The case study involves a depot\-based electric bus system with88buses, each with a battery capacity of365365kWh and an initial SOC of20%20\\%, served by88chargers rated at200200kW\. At the depot level, the charging process is coordinated over4848half\-hour intervals, and the case\-study instance is evaluated under DA conditions before moving to RT adjustments\. This setting remains representative of a realistic fleet depot in which charger ratings, aggregate energy availability, and service schedules jointly constrain operations\.

The operational side is defined through88route blocks with heterogeneous service windows\. According to the trip\-time sheet, route activity starts between 05:30 and 07:00 and finishes between 18:30 and 23:00, leaving different charging opportunities before, between, and after service blocks\. Figure[4](https://arxiv.org/html/2606.26400#S4.F4)shows these windows for each vehicle\. The corresponding nominal route energy\-consumption values range from0\.8590\.859to0\.9230\.923kWh/km, with an average of approximately0\.8910\.891kWh/km, which is sufficiently demanding to make both dwell\-time compression and route\-energy deviations operationally consequential\.

![Refer to caption](https://arxiv.org/html/2606.26400v1/x3.png)Figure 3:Real\-time optimization logic\.Table 2:Main fleet, infrastructure, and horizon parameters in the case study\.ParameterValueNumber of buses8Battery capacity per bus365 kWhInitial SOC per bus20%Number of chargers8Charger rating200 kWRoutes8Scheduling horizon24 hTime discretization30 min \(48 intervals\)![Refer to caption](https://arxiv.org/html/2606.26400v1/x4.png)Figure 4:Daily service windows extracted from the trip\-time input sheet\.The electricity\-market inputs are summarized in Table[3](https://arxiv.org/html/2606.26400#S4.T3)\. Spot\-market prices range from approximately 0\.067 to 0\.122 EUR/kWh, with a daily mean of about 0\.090 EUR/kWh\. In the DA analysis, this profile is the common benchmark price seen by the non\-agentic scenarios and the reference against which the multi\-agentic aggregator defines PTO buy and sell tariffs\. In the RT analysis, the same base profile serves as the anchor for determining whether the system remained on the DA tariff or switched to a revised RT tariff after an accepted re\-optimization\.

Table 3:Electricity\-price characteristics of the case\-study instance\.Price parameterValueSpot price range0\.067–0\.122 EUR/kWhAverage spot price0\.090 EUR/kWhNumber of intervals48Interval duration30 min
### 4\.2Day\-Ahead Planning Strategies

The DA analysis is organized around four strategies that progressively move from non\-agentic charging benchmarks toward agent\-guided aggregator coordination\. These strategies are summarized in Table[4](https://arxiv.org/html/2606.26400#S5.T4)\.

The first two strategies provide operational baselines: a rule\-based "dumb charging" policy and an optimization\-based smart\-charging policy without V2G participation\. The last two strategies activate the pricing role of the multi\-agentic aggregator under profit\-based and operational\-based coordination\. They aim to show how different aggregator postures affect the balance between grid\-facing flexibility value and PTO\-facing operating needs\.

### 4\.3Real\-Time Evaluation Scenarios

The RT evaluation examines how the multi\-agentic aggregator responds when observed operating conditions diverge from the accepted DA reference plan,𝐬DA∗\\mathbf\{s\}\_\{\\mathrm\{DA\}\}^\{\*\}\. Controlled perturbations are introduced in service timing, route energy consumption, electricity prices, and selected combined cases, representing typical mismatches between planning assumptions and realized electric\-bus operation\. Table[5](https://arxiv.org/html/2606.26400#S5.T5)summarizes the evaluated RT disturbance scenarios\.

The experiment uses structured input files to emulate 30\-minute operational updates\. At each RT interval, the workflow builds the observed system context and applies the trigger logic introduced in Section[3\.2\.3](https://arxiv.org/html/2606.26400#S3.SS2.SSS3)\. Ifδτ=skip\\delta\_\{\\tau\}=\\texttt\{skip\}, the active schedule is maintained and the monitoring record is updated\. Ifδτ=optimize\\delta\_\{\\tau\}=\\texttt\{optimize\}, the optimization model is solved from the current observed state, and the resulting schedule is evaluated, accepted, or sent to a bounded rerun\.

## 5Results and Discussion

### 5\.1Day\-Ahead Optimization Results

Table[6](https://arxiv.org/html/2606.26400#S5.T6)compares the four DA scenarios and shows a clear progression from conservative charging to flexibility\-based operation\.

Table 4:DA strategies considered in the evaluation design\.IDStrategyV2GMain purposeS1Dumb chargingOffRule\-based benchmark that charges buses as early as possible without market\-responsive optimization\.S2Smart charging \(no V2G\)OffDeterministic cost\-minimizing charging benchmark using spot\-market prices directly\.S3Profit\-based aggregatorOnAgent\-guided pricing case in which the aggregator prioritizes per\-fleet revenue and stronger tariff margins while preserving feasibility\.S4Operational\-based aggregatorOnAgent\-guided pricing case in which the aggregator prioritizes PTO\-compatible flexibility provision and lower tariff exposure\.Table 5:RT scenarios in the evaluation design\. Each scenario is analyzed using profit\-based \(S3\) and operational\-based \(S4\) strategies\.ScenarioFamilyDisturbanceTimeDescriptionD\+30 beg\.Service timing\+30\+30min delay04:30–09:00Early\-day delay compresses depot charging windows and reduces V2G export opportunities\.D\-30 beg\.−30\-30min early return04:30–09:00Early\-day early return creates unplanned dwell time; tests whether the schedule recovers flexibility\.D\+30 end\+30\+30min delay17:30–24:00Late\-day delay shortens the end\-of\-day charging window and threatens terminal SOC reserve\.D\-30 end−30\-30min early return17:30–24:00Late\-day early return frees time near the end of the horizon; tests V2G export response\.E\+50Route energy\+50%\+50\\%kWh/km, all buses06:00–20:00Higher traction demand raises grid purchases, depletes SOC reserve, and suppresses V2G\.E\-50−50%\-50\\%kWh/km, all buses06:00–20:00Lower traction demand frees surplus SOC; tests whether the aggregator expands V2G export\.P\+25Electricity price\+25%\+25\\%spot price02:30–05:00Moderate positive price shock; tests tariff\-guided reduction of charging during the window\.P\-25−25%\-25\\%spot price02:30–05:00Moderate negative price shock; tests whether the aggregator shifts charging into the low\-price window\.P\+50\+50%\+50\\%spot price02:30–05:00Strong positive price shock; tests the upper bound of tariff exposure and V2G suppression\.P\-50−50%\-50\\%spot price02:30–05:00Strong negative price shock; tests maximum charging shift and V2G compensation response\.C\-SeqCombinedP\+50P\+50thenE\+50E\+50thenD\+30D\+3002:30–05:00, 12:30–15:00, 17:30–24:00Sequential arrival of market, energy, and timing stressors; each disturbance resolved before the next arrives\.C\-All 5–48P\+50P\+50,E\+50E\+50,D\+30D\+30simultaneous02:30–24:00Full\-day simultaneous stress; all three disturbance types active across most of the operating horizon\.C\-All 5–25P\+50P\+50,E\+50E\+50,D\+30D\+30simultaneous02:30–12:30Early/midday simultaneous stress; disturbances resolve shortly after midday, leaving a clean late horizon\.C\-All 20–48P\+50P\+50,E\+50E\+50,D\+30D\+30simultaneous10:00–24:00Late\-morning\-to\-end\-of\-day simultaneous stress; disturbances begin after the morning period with limited remaining horizon to recover\.Table 6:DA results across the four evaluated scenarios using SOC\-derived energy accounting\.ScenarioPTO cost\(EUR/day\)Aggregatorrevenue\(EUR/day\)Bought\(kWh/day\)Sold\(kWh/day\)Avg\. buyprice\(EUR/kWh\)Avg\. sellprice\(EUR/kWh\)Min\.SOC\(%\)End avg\.SOC\(%\)Dumb charging \(S1\)218\.100\.002400\.00\.00\.0909–20\.0052\.38Smart charging, no V2G \(S2\)130\.470\.001600\.00\.00\.0815–20\.0025\.61Profit\-based aggregator \(S3\)140\.5920\.301900\.0300\.00\.08740\.084720\.0023\.44Operational\-based aggregator \(S4\)118\.912\.392000\.0400\.00\.08260\.115820\.0022\.72

Dumb charging \(S1\) provides the conservative baseline, with the highest PTO cost \(218\.10 EUR/day\), the largest grid purchase \(2400 kWh/day\), and the highest average terminal SOC \(52\.38%\)\. Smart charging without V2G \(S2\) reduces the PTO cost by 40\.2%, to 130\.47 EUR/day, by shifting charging to lower\-cost periods and reducing purchases to 1600 kWh/day, although the terminal SOC falls to 25\.61%\. The agentic cases \(S3 and S4\) add bidirectional grid exchange, but they express different reconciliation policies through both tariffs and dispatch\. The profit\-based mode \(S3\) buys 1900 kWh/day and sells 300 kWh/day, raising aggregator revenue to 20\.30 EUR/day but increasing PTO cost to 140\.59 EUR/day\. The operational\-based mode \(S4\) buys 2000 kWh/day and sells 400 kWh/day, reaching the lowest PTO cost, 118\.91 EUR/day, while reducing aggregator revenue to 2\.39 EUR/day\.

Figure[5](https://arxiv.org/html/2606.26400#S5.F5)explains how the two agentic cases create different financial outcomes through the tariff vectors selected by the Pricing Agent\.

![Refer to caption](https://arxiv.org/html/2606.26400v1/x5.png)\(a\)Time\-varying buy and sell tariffs\.
![Refer to caption](https://arxiv.org/html/2606.26400v1/x6.png)\(b\)SOC\-weighted average tariffs\.

Figure 5:DA tariff comparison for the agentic scenarios\. Panel \(a\) shows the time\-varying buy and sell price curves\. Panel \(b\) reports the corresponding SOC\-weighted average buy and sell price levels\.The time\-varying tariff trajectories show that the profit\-based mode maintains a pricing structure that is more favorable to the aggregator, while the operational\-based mode keeps the charging tariff close to the spot price and increases the V2G compensation\. The weighted average prices make this contrast clearer: the operational\-based case combines a lower average buy tariff \(0\.0826 EUR/kWh\) with a substantially higher sell\-back tariff \(0\.1158 EUR/kWh\), whereas the profit\-based case combines a higher buy tariff \(0\.0874 EUR/kWh\) with a lower sell\-back tariff \(0\.0847 EUR/kWh\)\. The agentic layer therefore modifies how the economic value of flexibility is distributed between the PTO and the aggregator\. The power and energy profiles in Figure[6](https://arxiv.org/html/2606.26400#S5.F6)show the operational behavior behind these results\.

![Refer to caption](https://arxiv.org/html/2606.26400v1/x7.png)\(a\)Aggregated charging and discharging power profiles\.
![Refer to caption](https://arxiv.org/html/2606.26400v1/x8.png)\(b\)Average fleet SOC trajectories\.

Figure 6:DA operating profiles\. Panel \(a\) compares aggregate power profiles\. Panel \(b\) compares the corresponding average fleet SOC trajectories\.In S1, charging is concentrated early in the horizon and remains purely unidirectional\. This explains the large energy purchase and the high terminal SOC observed\. In S2, charging is concentrated in a narrower time window, indicating that the optimizer selects more favorable charging periods instead of charging immediately whenever buses are available\. The agentic cases, S3 and S4, introduce discharge events near the end of the horizon, but the operational\-based case exports more energy than the profit\-based case\. The SOC trajectories provide an additional operational interpretation\. All scenarios respect the 20% lower SOC bound, but they use the available battery reserve differently\. S1 keeps the largest energy buffer, ending at 52\.38% average SOC\. S2 reduces this buffer to 25\.61%, which reflects a more efficient use of the battery capacity for cost minimization\. S3 ends at 23\.44% average SOC after 300 kWh/day of V2G export, whereas S4 ends at 22\.72% after 400 kWh/day of export\. The operational\-based DA plan therefore gives the PTO both lower tariff exposure and a larger sell\-back opportunity, while the profit\-based plan retains more margin for the aggregator from a smaller export volume\.

These results indicate that the main economic gain comes first from optimized charging, which reduces unnecessary energy purchases and shifts charging toward lower\-cost periods\. V2G adds a second layer of value by allowing the fleet to export energy, but the benefit depends strongly on the coordination mode selected by the aggregator\. The profit\-based mode favors aggregator revenue through a larger tariff spread, while the operational\-based mode favors PTO\-compatible participation through lower buy prices, higher sell\-back compensation, and greater export\. This point is important for the RT analysis, because the accepted DA plan establishes the expected cost, the aggregator revenue target, and the SOC margin available to absorb RT deviations\.

### 5\.2Real\-Time Disturbance Results

For consistency, RT deltas are computed against the DA reference of the corresponding coordination mode, and the applied tariffs are reconstructed at the timestep level: when no accepted RT tariff update is active, the DA multiplier vector is used; once the RT workflow accepts an updated vector, the executed tariff applies from that timestep onward\. For price\-disturbance cases the market price is additionally scaled during the disturbance window\. The weighted buy and sell tariffs reported below follow Eq\. \([14](https://arxiv.org/html/2606.26400#S5.E14)\):

ρ¯\+=∑twt\+​ρp​\(t\)\+​Δ​t∑twt\+​Δ​t,ρ¯−=∑twt−​ρp​\(t\)−​Δ​t∑twt−​Δ​t\.\\bar\{\\rho\}^\{\+\}=\\frac\{\\sum\_\{t\}w\_\{t\}^\{\+\}\\rho\_\{p\(t\)\}^\{\+\}\\Delta t\}\{\\sum\_\{t\}w\_\{t\}^\{\+\}\\Delta t\},\\qquad\\bar\{\\rho\}^\{\-\}=\\frac\{\\sum\_\{t\}w\_\{t\}^\{\-\}\\rho\_\{p\(t\)\}^\{\-\}\\Delta t\}\{\\sum\_\{t\}w\_\{t\}^\{\-\}\\Delta t\}\.\(14\)
We organize the analysis by disturbance family and report both coordination modes together, since the central result is the difference between them\. Per\-scenario outcomes for the two modes are collected in Tables[7](https://arxiv.org/html/2606.26400#S5.T7)and[8](https://arxiv.org/html/2606.26400#S5.T8), and the matched mode difference \(operational\-based minus profit\-based\) in Table[9](https://arxiv.org/html/2606.26400#S5.T9); the prose highlights the governing mechanism rather than restating individual values\.

Delay disturbances\.Timing shifts raise PTO cost in both modes by compressing the windows available for charging and export, but the modes recover differently\. Profit\-based operation holds sell tariffs low, so the aggregator recoups margin from the charging side and the fleet retains the unexported energy as elevated terminal SOC \(about 28–31%\)\. Operational\-based operation instead pairs a lower buy tariff with a substantially higher sell tariff, which contains PTO cost exposure even when discharge windows are reduced; every matched delay case is cheaper for the PTO than its profit\-based counterpart\.

Energy\-consumption disturbances\.The two energy cases act in opposite directions\. Under higher consumption \(E\+50E\+50\) grid purchases rise and export is suppressed \(profit\-based\) or only partially preserved \(operational\-based\), and PTO cost increases in both modes\. The lower\-consumption case \(E−50E\-50\) is more revealing\. In profit\-based mode it produces a counterintuitive outcome: despite reduced traction demand, PTO cost stays above the DA reference\. This follows from the two\-step re\-optimization — an early update exports the temporary SOC surplus while a later update repurchases energy to restore feasible battery margins once consumption returns to nominal — combined with a low V2G compensation tariff that leaves most of the export value with the aggregator\. Operational\-based mode turns the same disturbance into expanded flexibility: it exports 900 kWh \(versus 400 kWh\) at a much higher sell tariff, leaving PTO cost essentially unchanged from the DA reference while raising aggregator revenue only modestly\. The contrast shows that the coordination mode, not the disturbance, governs who captures the value released by a favorable deviation\.

Price disturbances\.Price shocks produce the largest economic spread and the sharpest mode separation\. Under positive shocks the profit\-based aggregator raises buy tariffs aggressively and suppresses export, driving PTO cost to the maximum among price\-shock scenarios \(247\.89 EUR atP\+50P\+50\) while maximizing its own revenue \(73\.70 EUR\)\. The operational\-based aggregator instead holds buy tariffs near their DA level and preserves full V2G compensation, roughly halving PTO cost at the same disturbance \(131\.81 EUR\) at the expense of its margin\. Negative shocks move both modes in the same direction — lower buy tariffs, preserved export, reduced PTO cost — so the two modes converge when the market itself relieves PTO exposure\.

Combined disturbances\.Compound stress exposes a structural, not merely quantitative, difference\. Profit\-based operation eliminates V2G export entirely across all four combined cases, so its economic response is driven solely by charging cost and tariff margin\. Operational\-based operation preserves export in three of the four cases \(all exceptCC\-All 20–48\), keeping flexibility exchange alive under stress\. The exception,CC\-All 20–48, is a relaxed\-feasibility outcome in both modes: late\-day disturbances leave insufficient time to restore the 20% reserve while meeting remaining service commitments, so the SOC floor drops below bound\. The sequential caseCC\-Seq is less severe than the simultaneous cases in both modes, because the system can re\-optimize between separated disturbances\.

Table 7:Profit\-based RT scenario summary using SOC\-derived grid exchange, including weighted applied tariffs\. Deltas are relative to the profit\-based DA reference\.ScenarioFamilyRT PTO\(EUR\)Δ\\DeltaPTO\(EUR\)RT agg\.\(EUR\)Δ\\Deltaagg\.\(EUR\)Buy\(kWh\)Sell\(kWh\)Buy tariff\(EUR/kWh\)Sell tariff\(EUR/kWh\)FinalSOC \(%\)D\-30 beg\.Delay177\.88\+37\.3025\.83\+5\.5220001000\.09250\.070830\.84D\+30 beg\.Delay180\.08\+39\.4940\.24\+19\.9320002000\.09800\.079430\.33D\-30 endDelay169\.82\+29\.2429\.98\+9\.6720002000\.09250\.075728\.49D\+30 endDelay177\.88\+37\.3025\.83\+5\.5220001000\.09250\.070829\.37E\+50Energy224\.99\+84\.4028\.18\+7\.88238800\.0942–27\.32E\-50Energy161\.67\+21\.0845\.94\+25\.6320004000\.09250\.058334\.35P\+25Price185\.89\+45\.3116\.90\-3\.41160000\.1162–25\.61P\+50Price247\.89\+107\.3073\.70\+53\.401700810\.14760\.037728\.70P\-25Price120\.95\-19\.6438\.90\+18\.6019003000\.07600\.078023\.44P\-50Price70\.17\-70\.4133\.65\+13\.3520004000\.05070\.078022\.72C\-SeqCombined225\.63\+85\.0457\.50\+37\.20190000\.1188–27\.54C\-All 5–48Combined245\.37\+104\.7849\.07\+28\.77240000\.1022–28\.42C\-All 5–25Combined257\.02\+116\.4358\.95\+38\.64240000\.1071–42\.29C\-All 20–48Combined250\.09\+109\.5039\.30\+19\.00240000\.1042–27\.48

Table 8:Operational\-based RT scenario summary using SOC\-derived grid exchange, including weighted applied tariffs\. Deltas are relative to the operational\-based DA reference\.ScenarioFamilyRT PTO\(EUR\)Δ\\DeltaPTO\(EUR\)RT agg\.\(EUR\)Δ\\Deltaagg\.\(EUR\)Buy\(kWh\)Sell\(kWh\)Buy tariff\(EUR/kWh\)Sell tariff\(EUR/kWh\)FinalSOC \(%\)D\-30 beg\.Delay157\.51\+38\.608\.91\+6\.5221002000\.08630\.118030\.48D\+30 beg\.Delay137\.03\+18\.129\.91\+7\.5220003000\.08580\.115526\.53D\-30 endDelay152\.56\+33\.6513\.38\+10\.9920002000\.08790\.116028\.87D\+30 endDelay164\.16\+45\.2512\.77\+10\.3820001000\.08790\.116029\.74E\+50Energy211\.09\+92\.1817\.22\+14\.8324941000\.08770\.075729\.76E\-50Energy118\.67\-0\.2417\.85\+15\.4624949000\.08770\.111131\.11P\+25Price130\.85\+11\.943\.28\+0\.8919003000\.08560\.105923\.44P\+50Price131\.81\+12\.903\.95\+1\.5620004000\.08710\.105922\.72P\-25Price80\.52\-38\.393\.45\+1\.0620004000\.06140\.105922\.72P\-50Price39\.56\-79\.353\.04\+0\.6520004000\.04100\.105922\.72C\-SeqCombined170\.28\+51\.372\.41\+0\.0220881000\.08710\.116029\.71C\-All 5–48Combined196\.32\+77\.413\.39\+1\.0026002000\.08440\.115326\.97C\-All 5–25Combined194\.53\+75\.623\.38\+0\.9926002000\.08370\.116040\.84C\-All 20–48Combined222\.74\+103\.8412\.62\+10\.23240000\.0928–27\.86

Table 9:Matched profit\-based–operational\-based RT comparison using SOC\-derived grid exchange \(operational\-based minus profit\-based\)\.ScenarioFamilyΔ\\DeltaPTO\(EUR\)Δ\\Deltaagg\.\(EUR\)Δ\\Deltabuy\(kWh\)Δ\\Deltasell\(kWh\)Δ\\DeltafinalSOC\(pp\)D\-30 beg\.Delay\-20\.37\-16\.91\+100\+100\-0\.36D\+30 beg\.Delay\-43\.05\-30\.330\+100\-3\.81D\-30 endDelay\-17\.26\-16\.6000\+0\.37D\+30 endDelay\-13\.72\-13\.0600\+0\.37E\+50Energy\-13\.90\-10\.96\+106\+100\+2\.44E\-50Energy\-43\.00\-28\.09\+494\+500\-3\.24P\+25Price\-55\.05\-13\.62\+300\+300\-2\.17P\+50Price\-116\.07\-69\.75\+300\+319\-5\.98P\-25Price\-40\.43\-35\.46\+100\+100\-0\.72P\-50Price\-30\.61\-30\.61000\.00C\-SeqCombined\-55\.35\-55\.09\+188\+100\+2\.18C\-All 5–48Combined\-49\.05\-45\.69\+200\+200\-1\.45C\-All 5–25Combined\-62\.48\-55\.57\+200\+200\-1\.45C\-All 20–48Combined\-27\.35\-26\.6800\+0\.37

The physical trajectories confirm these economic outcomes\. Figures[7](https://arxiv.org/html/2606.26400#S5.F7)and[8](https://arxiv.org/html/2606.26400#S5.F8)show that operational\-based operation sustains larger terminal reserves and broader export, most visibly inE−50E\-50and the price case, whereas profit\-based operation concentrates charging and curtails discharge, particularly under combined stress\.

![Refer to caption](https://arxiv.org/html/2606.26400v1/x9.png)\(a\)Average fleet SOC\.
![Refer to caption](https://arxiv.org/html/2606.26400v1/x10.png)\(b\)SOC\-derived average net power\.

Figure 7:Profit\-based RT operating heatmaps\. Panel \(a\) reports average fleet SOC\. Panel \(b\) reports average net power inferred from SOC variation\.![Refer to caption](https://arxiv.org/html/2606.26400v1/x11.png)\(a\)Average fleet SOC\.
![Refer to caption](https://arxiv.org/html/2606.26400v1/x12.png)\(b\)SOC\-derived average net power\.

Figure 8:Operational\-based RT operating heatmaps\. Panel \(a\) reports average fleet SOC\. Panel \(b\) reports average net power inferred from SOC variation\.In both modes the workflow intervenes selectively rather than continuously: accepted re\-optimizations cluster around disturbance\-relevant windows \(Figure[9](https://arxiv.org/html/2606.26400#S5.F9)\), so each update remains traceable to a specific event and its resulting SOC and tariff trajectory\.

![Refer to caption](https://arxiv.org/html/2606.26400v1/x13.png)\(a\)Profit\-based\.
![Refer to caption](https://arxiv.org/html/2606.26400v1/x14.png)\(b\)Operational\-based\.

Figure 9:RT trigger timelines\. Accepted re\-optimizations concentrate around disturbance\-relevant windows in both coordination modes\.Mode comparison\.Across all fourteen matched scenarios, operational\-based operation reduces PTO cost without exception, with the largest reductions where profit\-based tariffs are most aggressive \(P\+50P\+50and the combined cases\)\. This advantage is mirrored by a fall in aggregator revenue in all fourteen cases\. TheE−50E\-50scenario shows why revenue cannot be read from export volume alone: operational\-based mode sells 500 kWh more, yet earns less, because the higher compensation paid to the PTO outweighs the larger volume\. The results therefore identify no universally dominant mode but a reconciliation frontier between grid\-facing value capture and fleet\-facing participation: operational\-based control suits PTO cost containment, service robustness, and contract acceptability, while profit\-based control applies when aggregator revenue is the objective and the PTO has explicitly accepted the higher tariff exposure\.

Beyond characterizing this frontier, the matched comparison quantifies the economic exposure a PTO faces when the aggregator is profit\-oriented and unconstrained\. The gap between modes is not marginal: under the \(P\+50\) price shock the PTO pays 116\.07 EUR more per day under profit\-based coordination than under operational\-based coordination, and double\-digit daily gaps persist across the combined\-disturbance cases\. Because the agentic layer can shift between these modes through prompt configuration alone, with no change to the optimizer, the contract, or any externally visible system parameter, this exposure is neither observable to the PTO nor auditable by a regulator under current market arrangements\. The same mechanism that lets a cooperative aggregator protect the PTO also lets a profit\-oriented one extract value from it, and nothing in the technical architecture distinguishes the two from the outside\. This is the empirical basis for the policy argument developed in Section[5\.5](https://arxiv.org/html/2606.26400#S5.SS5): the aggregator domain requires explicit regulatory structure, because the operational\-based outcome cannot be assumed to arise on its own\.

### 5\.3Operational interpretation and decision\-support implications

The results lead to three main operational conclusions that extend beyond the specific scenarios evaluated\. \(i\) DA optimization and V2G pricing are separable decisions with distinct risk profiles\. Optimized no\-V2G charging reduces PTO cost by 40\.2% relative to dumb charging, showing that fleet operators can capture substantial savings before entering bidirectional flexibility markets\. V2G adds further value, but its practical attractiveness depends on how the coordination mode distributes cost savings, export revenue, and terminal reserve between the PTO and the aggregator\. \(ii\) Operational\-based aggregation is the most PTO\-compatible V2G policy in the evaluated cases\. Within the selected FS\+CoT configuration used for the main DA comparison, it produces the lowest PTO cost among the four main DA strategies, lowers cost in all fourteen matched RT scenarios, and offers lower buy tariffs with higher sell\-back compensation\. \(iii\) Operational\-based aggregation is not economically irrational from the aggregator’s perspective; it reflects a scale\-oriented revenue model in which lower margins per fleet support broader participation, larger flexibility volumes, and more durable contracts\. TheE−50E\-50case illustrates this trade\-off: the operational\-based plan exports 500 kWh more than the profit\-based plan, while higher sell compensation reduces per\-fleet aggregator revenue and raises terminal reserve\.

These conclusions also imply a disturbance\-dependent acceptance logic for the PTO when an RT re\-optimization is proposed\. For delay disturbances, the PTO should accept an update when the delay threatens terminal reserve, charger access, or service continuity, rather than only when it improves aggregator revenue\. For energy\-consumption disturbances, a cost\-increasing update can still be acceptable when it restores feasibility or preserves reserve, whereas a lower\-consumption update should be checked for unnecessary battery cycling before acceptance\. For price disturbances, which affect tariff exposure without disrupting service, a price\-only re\-optimization should be accepted only when it lowers the PTO bill or follows a pre\-agreed market\-sharing rule\. Across all three families, the acceptance criterion is the PTO’s own operational and cost exposure, not the aggregator’s revenue, which reinforces the case for the coordination\-mode and transparency safeguards discussed in Section[5\.5](https://arxiv.org/html/2606.26400#S5.SS5)\.

### 5\.4Prompt\-sensitivity analysis of the Pricing Agent

The preceding DA and RT analyses evaluate the complete agentic optimization workflow\. To isolate the behavior of the agentic layer itself, an additional DA experiment was conducted on the Pricing Agent under four prompt paradigms\. The optimization model, fleet input data, price profile, and evaluator structure were kept fixed; only the prompt used to generate the DA buy and sell multipliers was changed\. The four prompt scenarios are zero\-shot \(ZS\), chain\-of\-thought prompting \(CoT\), few\-shot prompting \(FS\), and few\-shot plus chain\-of\-thought prompting \(FS\+CoT\)\. Here, ZS provides the role, task, economic context, and output requirements without worked examples; CoT adds an explicit reasoning scaffold; FS adds example\-guided pricing behavior; and FS\+CoT combines examples with structured reasoning\. Since the RT experiments use zero\-shot role prompting with constrained output formatting, this sensitivity analysis focuses on the DA Pricing Agent, where the prompt variants were explicitly tested\.

Table[10](https://arxiv.org/html/2606.26400#S5.T10)reports the accepted DA outcomes from the prompt\-experiment workbooks using the same SOC\-derived accounting as Table[6](https://arxiv.org/html/2606.26400#S5.T6)\. The FS\+CoT rows therefore correspond to the accepted profit\-based and operational\-based DA files used in the main DA analysis\.[Appendix C](https://arxiv.org/html/2606.26400#A3)provides compact versions of the tested prompt templates\.

Under profit\-based behavior, prompt design strongly affects the agent’s ability to express the intended aggregator objective\. Aggregator revenue increases from 6\.66 EUR/day under ZS to 10\.61 EUR/day under FS, 15\.96 EUR/day under CoT, and 20\.30 EUR/day under FS\+CoT\. This ordering indicates that examples help the agent identify useful tariff patterns, while the reasoning scaffold helps connect tariff spread, V2G volume, and aggregator margin\. The operational\-based results show a different pattern: PTO costs are tightly clustered, spanning only 0\.71 EUR/day between the best and worst cases\. FS gives the lowest PTO cost, 118\.51 EUR/day, while FS\+CoT gives the lowest aggregator revenue, 2\.39 EUR/day, and remains close to the best PTO outcome\. CoT alone performs worst for operational\-based pricing, increasing both PTO cost and aggregator revenue\. This suggests that reasoning without examples can lead the agent to over\-emphasize economically plausible tariff spread rather than the PTO\-facing objective, while examples provide the main alignment signal in the operational\-based case\.

These experiments support the interpretation that the proposed framework is not only an optimization wrapper, but an agentic decision system whose behavior depends on how economic objectives are communicated to the agent\. The optimizer enforces feasibility once tariffs are provided, but the prompt controls the tariff hypothesis evaluated by the optimizer\. Prompting is therefore part of the experimental design, and prompt paradigms should be reported as experimental scenarios and treated as a reproducibility variable in agentic energy\-management studies\.

Table 10:Pricing Agent prompt\-sensitivity results\.ModePromptaPTO cost\(EUR/day\)Aggregatorrevenue\(EUR/day\)rankbPBZS137\.136\.664PBFS137\.3110\.613PBCoT135\.2315\.962PBFS\+CoT140\.5920\.301OBZS118\.832\.752OBFS118\.512\.431OBCoT119\.223\.154OBFS\+CoT118\.912\.393
- aPB: profit\-based; OB: operational\-based; ZS: zero\-shot; FS: few\-shot; CoT: chain\-of\-thought; FS\+CoT: few\-shot \+ CoT\.
- bRank 1 = best under mode objective: max aggregator revenue \(PB\) or min PTO cost \(OB\)\.

### 5\.5Policy Implications

The preceding results are operational, but their strongest implication is regulatory: the aggregator domain cannot be left to unconstrained market behavior if electric bus fleets are to participate in V2G markets on sustainable terms\. Three findings established above support this conclusion\.

First, the coordination mode, not the disturbance, governs how flexibility value is split between the aggregator and the PTO, and the gap is large — up to 116\.07 EUR/day under theP\+50P\+50shock\. A profit\-maximizing aggregator that raises PTO cost is not behaving anomalously; it is doing exactly what an unconstrained revenue objective implies\. Absent a rule that bounds tariff margins or mandates value\-sharing, the PTO, a public entity operating on public budgets, absorbs this exposure directly\.

Second, for an agentic aggregator this pricing aggressiveness is set by prompt configuration alone, which is invisible from outside the system \(Section[5\.4](https://arxiv.org/html/2606.26400#S5.SS4)\)\. This is a qualitatively different regulatory problem from classical aggregator design: a rule\-based pricing scheme can be audited from its parameters, whereas an LLM\-based one can be re\-tuned through prompt edits that leave no structural trace, and that disclosure rules written for conventional aggregators would not capture\.

Third, the PTO\-protective operational\-based outcome is not the aggregator’s default incentive, since it reduces per\-fleet revenue in every matched scenario\. A profit\-oriented operator has no self\-interested reason to adopt it unless its revenue model is explicitly scale\-based or its margins are externally constrained\. The PTO\-favorable equilibrium must therefore be induced, not assumed\.

These findings motivate three regulatory measures for V2G markets involving public fleets: \(i\) bounds on aggregator tariff margins, or mandated value\-sharing rules, so that PTO cost exposure cannot exceed a defined level regardless of the aggregator’s objective; \(ii\) transparency requirements covering the coordination mode and, for agentic systems, the prompt configuration and constrained\-output specification that govern pricing behavior, so that the chosen objective is auditable; and \(iii\) reporting standards requiring disclosure of realized tariff vectors, trigger decisions, and value allocation, extending existing market\-conduct oversight to the agentic setting\.

A related implication concerns the role of human stakeholders in governing agentic aggregators\. Human oversight should operate primarily at the governance level rather than through manual approval of every charging or V2G action\. In practice, human control should be placed at the level of objectives, constraints, audit, and intervention authority\. PTOs, aggregators, and regulators should define the admissible coordination mode, tariff\-margin bounds, SOC and service\-reliability safeguards, and conditions under which the system may re\-optimize or switch pricing behavior\. Human decision\-makers should also have access to interpretable records of prompt configuration, tariff vectors, trigger decisions, accepted schedules, and value allocation, so that agentic decisions can be reviewed and overridden when they conflict with public\-service priorities\. In this framing, the agentic aggregator remains an automated operational layer, but its economic posture and safety envelope remain human\-governed\.

## 6Conclusions

This paper proposed a multi\-agentic framework for electric bus fleet\-grid coordination under DA and RT operation\. The aggregator is framed as a reconciliation entity that converts grid\-side flexibility signals into fleet\-feasible charging and V2G decisions while allocating value between the aggregator and the PTO\. The framework combines an optimization\-based charging model with three supervisory agents: a Trigger Agent that determines whether updated conditions justify intervention, a Pricing Agent that revises tariff guidance, and an Evaluator Agent that accepts or rejects the resulting operating plan\. In this architecture, the DA layer defines the nominal charging and V2G reference, while the RT layer receives updated information in 30\-minute blocks and revises the plan only when disturbances become operationally or economically relevant\. The contribution is therefore a structured decision pipeline that separates disturbance detection, tariff adaptation, and schedule acceptance while preserving the transparency of an optimization\-based scheduler\.

The DA results establish the economic baseline\. Optimized no\-V2G charging reduces PTO cost by 40\.2% relative to dumb charging, showing that substantial fleet savings are available even before bidirectional flexibility is considered\. The V2G cases then show that the aggregator coordination mode determines how the additional value is distributed\. The profit\-based mode produces the highest DA aggregator revenue, 20\.30 EUR/day, by maintaining a larger spread between PTO buy and sell\-back tariffs\. Within the selected FS\+CoT configuration used for the main DA comparison, the operational\-based mode produces the lowest PTO cost among the four main DA strategies, 118\.91 EUR/day, by lowering the weighted buy tariff and increasing the sell\-back tariff\. The prompt\-sensitivity analysis further shows that the FS operational\-based prompt reduces this cost to 118\.51 EUR/day\. Thus, the aggregator objective is not a neutral modeling detail; it directly controls the allocation of cost savings, V2G revenue, and terminal reserve\. The RT results strengthen this conclusion under disturbance conditions\. Across the fourteen matched RT scenarios, operational\-based operation lowers PTO cost in every case relative to profit\-based operation\. The largest PTO reductions occur underP\+50P\+50\(\-116\.07 EUR\),CC\-All 5–25 \(\-62\.48 EUR\),CC\-Seq \(\-55\.35 EUR\), andP\+25P\+25\(\-55\.05 EUR\), showing that operational\-based pricing is especially important when positive price shocks, early service disruptions, or compound disturbances would otherwise expose the PTO to high tariff margins or reduced V2G value\. The PTO benefit is not free from the aggregator’s perspective: operational\-based operation reduces aggregator revenue in all fourteen matched scenarios\. InE−50E\-50, the lower energy\-consumption disturbance creates enough additional stored energy for operational\-based dispatch to sell 900 kWh, but the higher compensation paid to the PTO keeps aggregator revenue below the profit\-based case\. The RT results therefore show that aggregator revenue is governed by both tariff margin and available flexibility volume, not by tariff margin alone\. The prompt\-sensitivity experiment confirms that the agentic layer is itself an experimental object, not only an interface to the optimizer\. For profit\-based pricing, the objective ranking is FS\+CoT, CoT, FS, then ZS, with SOC\-derived aggregator revenue increasing from 6\.66 EUR/day under ZS to 20\.30 EUR/day under FS\+CoT\. For operational\-based pricing, examples are more important than reasoning alone: FS gives the lowest SOC\-derived PTO cost, 118\.51 EUR/day, while FS\+CoT gives the lowest aggregator revenue and remains within 0\.40 EUR/day of the best PTO\-cost result\. CoT alone performs worst because it raises PTO cost and aggregator revenue relative to the example\-guided prompts\. These results show that prompt design affects whether the Pricing Agent expresses the intended economic behavior\. Future agentic energy\-management studies should therefore report prompt paradigms and constrained\-output assumptions alongside optimization inputs and results\.

From the fleet perspective, relying only on the no\-V2G optimum can be economically beneficial, but it leaves bidirectional flexibility value unused\. If V2G is adopted, the operational\-based coordination mode is the PTO\-facing policy that consistently makes sense in the evaluated scenarios because it lowers PTO cost in all matched RT cases and offers more favorable tariff exposure\. From the aggregator perspective, operational\-based operation is more compatible with durable fleet participation and contract acceptance, but it generally reduces revenue per fleet\. Large aggregator revenues should therefore come primarily from increasing the number, size, and diversity of aggregated fleets, and from mobilizing larger volumes of flexibility, rather than from extracting high margins from one PTO fleet\. This is the central operational implication of the reconciliation framing: scalable grid flexibility from electric bus fleets depends on participation terms that remain acceptable to the PTO\.

Beyond these operational lessons, the results carry a regulatory implication that is, in our view, the broader contribution of the study\. Because the coordination mode alone moves PTO cost by up to 116\.07 EUR/day in the evaluated cases, and because an agentic aggregator can switch between modes through prompt configuration that leaves no externally visible trace, the PTO\-favorable outcome cannot be assumed to arise on its own and cannot be verified by a counterparty under current market arrangements\. The aggregator domain therefore warrants explicit regulatory structure when public transport fleets are involved: bounds on tariff margins or mandated value\-sharing to cap PTO cost exposure, transparency requirements covering the coordination mode and the prompt and constrained\-output specifications that govern agentic pricing, and reporting standards that extend market\-conduct oversight to realized tariff vectors, trigger decisions, and value allocation\. As agentic AI is adopted in energy\-market intermediation, the framework proposed here doubles as a diagnostic instrument that makes the mode\- and prompt\-dependent allocation of value explicit, and thereby identifies where such oversight is most needed\.

The results should be interpreted within the limits of the present study\. The RT evaluation uses controlled isolated and combined disturbance cases, and it compares accepted RT responses against the DA coordination references without adding no\-action, rule\-based trigger, or always\-reoptimize baselines\. The experiments also rely on a single depot configuration, charger setting, price profile, and initial SOC structure\. The policy implications drawn above follow from this single\-setting evaluation and should be read as evidence\-based motivation for regulatory attention rather than as calibrated threshold values\. Future work should extend the evaluation to larger and more heterogeneous depots, different charger\-to\-bus ratios, alternative market conditions, richer combined\-disturbance designs, stochastic repeated runs, prompt\-sensitivity analysis, solver\-runtime measurement, and formal service\-reliability metrics\. Furthermore, Cybersecurity issues of agentic systems, as mentioned in\[[12](https://arxiv.org/html/2606.26400#bib.bib43)\]and\[[10](https://arxiv.org/html/2606.26400#bib.bib44)\]will also be investigated\. The main methodological lesson is that RT fleet\-grid coordination should report not only aggregate cost and revenue, but also the trigger timing, accepted tariff vectors, SOC trajectories, and V2G exchange pathways through which disturbances reshape operational outcomes\.

## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper\.

## Data Availability Statement

## Acknowledgment

This project was made possible through financial support from the Government of Canada and the Fonds de recherche du Québec – Nature et technologies \(FRQNT\)\. Funding from the Government of Canada’s Environmental Damages Fund was provided under its Climate Action and Awareness Fund\. At the same time, the FRQNT contributed through the Programme de recherche en partenariat \- Réduction des GES \- Mobilité Durable\.

## Appendix Appendix ANomenclature

Indices

iiRoute or service\-block index\.
τ\\tauCurrent operating interval in the RT workflow\.
rrRerun index in the DA or RT workflow\.
Sets

𝒫\\mathcal\{P\}Set of tariff periods\.
𝒯\\mathcal\{T\}Set of time intervals in the DA optimization horizon\.
𝒯τRT\\mathcal\{T\}\_\{\\tau\}^\{\\mathrm\{RT\}\}Set of RT optimization intervals fromτ\\tauto the end of the operating day\.
𝒦\\mathcal\{K\}Set of electric buses\.
𝒩\\mathcal\{N\}Set of chargers\.
ℐ\\mathcal\{I\}Set of routes or service blocks\.
Time and workflow parameters

TTFinal interval of the operating horizon\.
Δ​t\\Delta tDuration of one time interval\.
p​\(t\)p\(t\)Tariff period associated with time intervaltt\.
RmaxR\_\{\\max\}Maximum number of allowed optimization reruns\.
Tariff and market parameters

πtg\\pi\_\{t\}^\{g\}Grid energy price at time intervaltt\.
πtg,obs\\pi\_\{t\}^\{g,\\mathrm\{obs\}\}Observed or updated grid energy price used in RT at intervaltt\.
𝝅τ:Tg,obs\\boldsymbol\{\\pi\}\_\{\\tau:T\}^\{g,\\mathrm\{obs\}\}Updated grid\-price profile used from RT intervalτ\\tautoTT\.
π¯pg\\bar\{\\pi\}\_\{p\}^\{g\}Average grid energy price in tariff periodpp\.
ρp\+\\rho\_\{p\}^\{\+\}Tariff charged by the aggregator to the PTO for charging energy in tariff periodpp\.
ρp−\\rho\_\{p\}^\{\-\}Tariff paid by the aggregator to the PTO for V2G energy in tariff periodpp\.
𝝆\+\\boldsymbol\{\\rho\}^\{\+\}Vector of charging\-energy tariffs\.
𝝆−\\boldsymbol\{\\rho\}^\{\-\}Vector of V2G compensation tariffs\.
αp\+\\alpha\_\{p\}^\{\+\}Charging\-energy tariff multiplier proposed by the Pricing Agent in tariff periodpp\.
αp−\\alpha\_\{p\}^\{\-\}V2G compensation tariff multiplier proposed by the Pricing Agent in tariff periodpp\.
α¯m\+,α¯m\+\\underline\{\\alpha\}\_\{m\}^\{\+\},\\overline\{\\alpha\}\_\{m\}^\{\+\}Lower and upper bounds on charging\-energy tariff multipliers under modemm\.
α¯m−,α¯m−\\underline\{\\alpha\}\_\{m\}^\{\-\},\\overline\{\\alpha\}\_\{m\}^\{\-\}Lower and upper bounds on V2G compensation tariff multipliers under modemm\.
Fleet, battery, and charger parameters

PnchP\_\{n\}^\{\\mathrm\{ch\}\}Charging power rating of chargernn\.
PndisP\_\{n\}^\{\\mathrm\{dis\}\}V2G discharging power rating of chargernn\.
ηnch\\eta\_\{n\}^\{\\mathrm\{ch\}\}Charging efficiency of chargernn\.
ηndis\\eta\_\{n\}^\{\\mathrm\{dis\}\}Discharging efficiency of chargernn\.
EkminE\_\{k\}^\{\\min\}Minimum allowable stored energy of buskk\.
EkmaxE\_\{k\}^\{\\max\}Maximum allowable stored energy of buskk\.
Ek0E\_\{k\}^\{0\}Initial stored energy of buskkin the DA optimization\.
EkendE\_\{k\}^\{\\mathrm\{end\}\}Required end\-of\-day stored energy of buskk\.
ξi\\xi\_\{i\}Route\-level energy requirement for route or service blockii\.
ξiobs\\xi\_\{i\}^\{\\mathrm\{obs\}\}Observed route\-level energy requirement used in RT operation\.
ξiRT\\xi\_\{i\}^\{\\mathrm\{RT\}\}Updated RT route\-level energy requirement\.
𝝃obs\\boldsymbol\{\\xi\}^\{\\mathrm\{obs\}\}Vector of observed route\-level energy requirements used in RT operation\.
W¯\\overline\{W\}Maximum admissible aggregate depot power exchange\.
Decision variables

xk,n,tx\_\{k,n,t\}Binary variable equal to 1 if buskkcharges at chargernnduring intervaltt\.
yk,n,ty\_\{k,n,t\}Binary variable equal to 1 if buskkdischarges through chargernnduring intervaltt\.
bk,i,tb\_\{k,i,t\}Binary variable equal to 1 if buskkis assigned to route or service blockiiduring intervaltt\.
b¯k,i,tactive\\bar\{b\}\_\{k,i,t\}^\{\\mathrm\{active\}\}Indicator fixing route assignments already underway in RT operation\.
ek,te\_\{k,t\}Stored energy of buskkat time intervaltt\.
wt\+w\_\{t\}^\{\+\}Aggregate charging power purchased by the PTO from the aggregator at intervaltt\.
wt−w\_\{t\}^\{\-\}Aggregate V2G discharging power sold by the PTO to the aggregator at intervaltt\.
𝐬\\mathbf\{s\}PTO scheduling decision vector\.
𝐬∗\\mathbf\{s\}^\{\*\}Optimal or accepted PTO schedule\.
𝐬d,r∗\\mathbf\{s\}\_\{d,r\}^\{\*\}Candidate DA schedule at operating dayddand rerunrr\.
𝐬DA∗\\mathbf\{s\}\_\{\\mathrm\{DA\}\}^\{\*\}Accepted DA reference schedule\.
𝐬τ−1∗\\mathbf\{s\}\_\{\\tau\-1\}^\{\*\}Last accepted schedule before RT intervalτ\\tau\.
𝐬τ∗\\mathbf\{s\}\_\{\\tau\}^\{\*\}Accepted schedule at RT intervalτ\\tau\.
𝐬τ,r∗\\mathbf\{s\}\_\{\\tau,r\}^\{\*\}Candidate RT schedule at intervalτ\\tauand rerunrr\.
𝐬τRT,∗\\mathbf\{s\}\_\{\\tau\}^\{\\mathrm\{RT\},\*\}Accepted RT schedule computed from intervalτ\\tauonward\.
𝐬ref\\mathbf\{s\}\_\{\\mathrm\{ref\}\}Active reference schedule used during RT operation\.

## Appendix Appendix BOptimization Core

Having established the supervisory agent roles in Section[3\.2](https://arxiv.org/html/2606.26400#S3.SS2), this Appendix presents the mathematical problem they govern\. The aggregator pricing function is fully assumed by the agentic layer; what remains here is the PTO scheduling model that the agents orchestrate, the real\-time re\-optimization logic that is triggered by the agents, and the formal agentic supervisory flow that connects the two\. The PTO scheduling problem is optimization\-based throughout, since route service, battery feasibility, and charger availability must satisfy explicit physical constraints regardless of how the surrounding decision process is orchestrated\.

### B\.1PTO Scheduling Model: Full Formulation

Letp∈𝒫p\\in\\mathcal\{P\}denote tariff periods,t∈𝒯t\\in\\mathcal\{T\}time intervals,k∈𝒦k\\in\\mathcal\{K\}buses,n∈𝒩n\\in\\mathcal\{N\}chargers, andi∈ℐi\\in\\mathcal\{I\}routes\. At each DA or RT invocation, the Pricing Agent supplies two tariff vectors:𝝆\+\\boldsymbol\{\\rho\}^\{\+\}, the price charged to the PTO for charging energy, and𝝆−\\boldsymbol\{\\rho\}^\{\-\}, the price paid to the PTO for V2G energy\. The relationship between these tariff vectors and the agent\-proposed multipliers is defined in Appendix[B\.3](https://arxiv.org/html/2606.26400#A2.SS3)\. Given these tariffs, the PTO selects a schedule𝐬∗\\mathbf\{s\}^\{\*\}that minimizes its net energy\-trading cost:

𝐬∗∈arg⁡min𝐬∈𝒮​\(𝝆\+,𝝆−\)⁡CPTO=∑t∈𝒯\[ρp​\(t\)\+​wt\+​Δ​t−ρp​\(t\)−​wt−​Δ​t\],\\mathbf\{s\}^\{\*\}\\in\\arg\\min\_\{\\mathbf\{s\}\\in\\mathcal\{S\}\(\\boldsymbol\{\\rho\}^\{\+\},\\boldsymbol\{\\rho\}^\{\-\}\)\}C^\{\\mathrm\{PTO\}\}=\\sum\_\{t\\in\\mathcal\{T\}\}\\left\[\\rho\_\{p\(t\)\}^\{\+\}w\_\{t\}^\{\+\}\\Delta t\-\\rho\_\{p\(t\)\}^\{\-\}w\_\{t\}^\{\-\}\\Delta t\\right\],\(B\.1\)where𝒮​\(𝝆\+,𝝆−\)\\mathcal\{S\}\(\\boldsymbol\{\\rho\}^\{\+\},\\boldsymbol\{\\rho\}^\{\-\}\)is the feasible set defined by the operational constraints below\. The objective minimizes the cost of charging energy net of V2G compensation\.

The decision vector includes the route\-assignment variablesbk,i,tb\_\{k,i,t\}, the charging variablesxk,n,tx\_\{k,n,t\}, the V2G discharging variablesyk,n,ty\_\{k,n,t\}, the stored energy variablesek,te\_\{k,t\}, and the aggregate depot power variableswt\+w\_\{t\}^\{\+\}andwt−w\_\{t\}^\{\-\}\.

A bus cannot serve a route and use a charger at the same time\. This operational exclusivity is represented by Eq\. \([B\.2](https://arxiv.org/html/2606.26400#A2.E2)\):

∑n∈𝒩\(xk,n,t\+yk,n,t\)\+∑i∈ℐbk,i,t≤1,∀k∈𝒦,∀t∈𝒯\.\\sum\_\{n\\in\\mathcal\{N\}\}\\left\(x\_\{k,n,t\}\+y\_\{k,n,t\}\\right\)\+\\sum\_\{i\\in\\mathcal\{I\}\}b\_\{k,i,t\}\\leq 1,\\qquad\\forall k\\in\\mathcal\{K\},\\forall t\\in\\mathcal\{T\}\.\(B\.2\)The first term identifies whether buskkis connected to a charger for charging or discharging, while the second term identifies whether it is assigned to a route\. This constraint prevents physically incompatible decisions, such as charging a bus while it is in service\. The stored energy of each bus evolves according to Eq\. \([B\.3](https://arxiv.org/html/2606.26400#A2.E3)\):

ek,t\+1=ek,t\+∑n∈𝒩ηnch​Pnch​xk,n,t​Δ​t−∑n∈𝒩1ηndis​Pndis​yk,n,t​Δ​t−∑i∈ℐξi​bk,i,t,∀k∈𝒦,∀t∈𝒯\.\\displaystyle\\begin\{split\}e\_\{k,t\+1\}&=e\_\{k,t\}\+\\sum\_\{n\\in\\mathcal\{N\}\}\\eta\_\{n\}^\{\\mathrm\{ch\}\}P\_\{n\}^\{\\mathrm\{ch\}\}x\_\{k,n,t\}\\Delta t\-\\sum\_\{n\\in\\mathcal\{N\}\}\\frac\{1\}\{\\eta\_\{n\}^\{\\mathrm\{dis\}\}\}P\_\{n\}^\{\\mathrm\{dis\}\}y\_\{k,n,t\}\\Delta t\\\\ &\-\\sum\_\{i\\in\\mathcal\{I\}\}\\xi\_\{i\}b\_\{k,i,t\},\\quad\\forall k\\in\\mathcal\{K\},\\forall t\\in\\mathcal\{T\}\.\\end\{split\}\(B\.3\)The second term represents the energy added through charging, corrected by the charging efficiency\. The third term represents the energy removed through V2G discharge, corrected by the discharging efficiency\. The final term represents the traction energy consumed when buskkserves routeii, whereξi\\xi\_\{i\}is the energy consumption\. Battery feasibility is enforced through Eq\. \([B\.4](https://arxiv.org/html/2606.26400#A2.E4)\):

Ekmin≤ek,t≤Ekmax,ek,0=Ek0,ek,T≥Ekend,∀k∈𝒦,∀t∈𝒯\.\\begin\{split\}E\_\{k\}^\{\\min\}&\\leq e\_\{k,t\}\\leq E\_\{k\}^\{\\max\},\\qquad e\_\{k,0\}=E\_\{k\}^\{0\},\\\\ &e\_\{k,T\}\\geq E\_\{k\}^\{\\mathrm\{end\}\},\\qquad\\forall k\\in\\mathcal\{K\},\\forall t\\in\\mathcal\{T\}\.\\end\{split\}\(B\.4\)These constraints ensure that each bus remains within its admissible battery range, starts from the specified initial energy level, and finishes the horizon with enough energy for subsequent operation\. Charging and discharging are mutually exclusive for each bus–charger pair as presented in Eq\. \([B\.5](https://arxiv.org/html/2606.26400#A2.E5)\):

xk,n,t\+yk,n,t≤1,∀k∈𝒦,∀n∈𝒩,∀t∈𝒯\.x\_\{k,n,t\}\+y\_\{k,n,t\}\\leq 1,\\qquad\\forall k\\in\\mathcal\{K\},\\;\\forall n\\in\\mathcal\{N\},\\;\\forall t\\in\\mathcal\{T\}\.\(B\.5\)Additional charger\-capacity constraints ensure that each charger serves at most one bus in each time interval\. At the depot level, individual charging and discharging decisions are aggregated as in Eqs\. \([B\.6](https://arxiv.org/html/2606.26400#A2.E6)\)–\([B\.7](https://arxiv.org/html/2606.26400#A2.E7)\):

wt\+=∑k∈𝒦∑n∈𝒩Pnch​xk,n,t,∀t∈𝒯,w\_\{t\}^\{\+\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\sum\_\{n\\in\\mathcal\{N\}\}P\_\{n\}^\{\\mathrm\{ch\}\}x\_\{k,n,t\},\\qquad\\forall t\\in\\mathcal\{T\},\(B\.6\)wt−=∑k∈𝒦∑n∈𝒩Pndis​yk,n,t,∀t∈𝒯\.w\_\{t\}^\{\-\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\sum\_\{n\\in\\mathcal\{N\}\}P\_\{n\}^\{\\mathrm\{dis\}\}y\_\{k,n,t\},\\qquad\\forall t\\in\\mathcal\{T\}\.\(B\.7\)The depot exchange is bounded by the admissible power limit as in Eq\. \([B\.8](https://arxiv.org/html/2606.26400#A2.E8)\):

0≤wt\+≤W¯,0≤wt−≤W¯,∀t∈𝒯\.0\\leq w\_\{t\}^\{\+\}\\leq\\overline\{W\},\\qquad 0\\leq w\_\{t\}^\{\-\}\\leq\\overline\{W\},\\qquad\\forall t\\in\\mathcal\{T\}\.\(B\.8\)

### B\.2Real\-Time Re\-Optimization Logic

The DA formulation computes a full\-day reference schedule before operation begins\. During operation, this reference may become outdated because the observed system state can differ from the DA assumptions\. These deviations may include differences between planned and observed bus SOC, service delays, updated electricity prices, or changes in the route\-level energy required to complete service\. The RT workflow addresses these deviations by rerunning the PTO optimization model\. The RT workflow logic is summarized in Fig\.[3](https://arxiv.org/html/2606.26400#S4.F3)\.

Letτ\\taudenote the current operating interval\. If the Trigger Agent decides to re\-optimize, the optimization horizon is updated from the current interval to the end of the operating day as in Eq\. \([B\.9](https://arxiv.org/html/2606.26400#A2.E9)\):

𝒯τRT=\{τ,τ\+1,…,T\}\.\\mathcal\{T\}\_\{\\tau\}^\{\\mathrm\{RT\}\}=\\\{\\tau,\\tau\+1,\\ldots,T\\\}\.\(B\.9\)The RT optimization does not revise decisions for intervals that have already occurred\. It uses the observed state at intervalτ\\tauas the new initial condition and optimizes the remaining charging, discharging, and service decisions from that point onward\. At each RT update, the observed state is represented as in Eq\. \([B\.10](https://arxiv.org/html/2606.26400#A2.E10)\):

𝐱τobs=\(𝐞τobs,𝐝τobs,𝝅τ:Tg,obs,𝝃obs\),\\mathbf\{x\}\_\{\\tau\}^\{\\mathrm\{obs\}\}=\\left\(\\mathbf\{e\}\_\{\\tau\}^\{\\mathrm\{obs\}\},\\mathbf\{d\}\_\{\\tau\}^\{\\mathrm\{obs\}\},\\boldsymbol\{\\pi\}\_\{\\tau:T\}^\{g,\\mathrm\{obs\}\},\\boldsymbol\{\\xi\}^\{\\mathrm\{obs\}\}\\right\),\(B\.10\)where𝐞τobs\\mathbf\{e\}\_\{\\tau\}^\{\\mathrm\{obs\}\}is the observed bus\-energy vector at the current interval,𝐝τobs\\mathbf\{d\}\_\{\\tau\}^\{\\mathrm\{obs\}\}contains the observed service\-timing state,𝝅τ:Tg,obs\\boldsymbol\{\\pi\}\_\{\\tau:T\}^\{g,\\mathrm\{obs\}\}is the updated electricity\-price profile used for the remaining horizon, and𝝃obs\\boldsymbol\{\\xi\}^\{\\mathrm\{obs\}\}contains the observed route\-level energy consumption\. Once observed, these quantities replace the corresponding DA assumptions and define the updated operating condition for the remaining horizon\.

Given the observed state and the price guidance𝐮τ\\mathbf\{u\}\_\{\\tau\}provided by the Pricing Agent, the RT optimizer computes a revised schedule as in Eq\. \([B\.11](https://arxiv.org/html/2606.26400#A2.E11)\):

𝐬τRT,∗∈arg⁡min𝐬∈𝒮τRT​\(𝐮τ,𝐱τobs\)⁡Cτ:TPTO​\(𝐮τ,𝐬\),\\mathbf\{s\}\_\{\\tau\}^\{\\mathrm\{RT\},\*\}\\in\\arg\\min\_\{\\mathbf\{s\}\\in\\mathcal\{S\}\_\{\\tau\}^\{\\mathrm\{RT\}\}\(\\mathbf\{u\}\_\{\\tau\},\\mathbf\{x\}\_\{\\tau\}^\{\\mathrm\{obs\}\}\)\}C\_\{\\tau:T\}^\{\\mathrm\{PTO\}\}\\\!\\left\(\\mathbf\{u\}\_\{\\tau\},\\mathbf\{s\}\\right\),\(B\.11\)where𝐬τRT,∗\\mathbf\{s\}\_\{\\tau\}^\{\\mathrm\{RT\},\*\}is the revised RT schedule,𝒮τRT​\(⋅\)\\mathcal\{S\}\_\{\\tau\}^\{\\mathrm\{RT\}\}\(\\cdot\)is the feasible set updated with the observed RT conditions, andCτ:TPTO​\(⋅\)C\_\{\\tau:T\}^\{\\mathrm\{PTO\}\}\(\\cdot\)is the PTO energy\-trading cost over the remaining horizon\. The RT model preserves the same physical constraints as the DA model; the difference is that selected inputs are updated with observations before the model is rerun\. The main RT updates are:

ek,τ\\displaystyle e\_\{k,\\tau\}=ek,τobs,\\displaystyle=e\_\{k,\\tau\}^\{\\mathrm\{obs\}\},∀k∈𝒦,\\displaystyle\\forall k\\in\\mathcal\{K\},\(B\.12\)ξiRT\\displaystyle\\xi\_\{i\}^\{\\mathrm\{RT\}\}=ξiobs,\\displaystyle=\\xi\_\{i\}^\{\\mathrm\{obs\}\},∀i∈ℐ,\\displaystyle\\forall i\\in\\mathcal\{I\},\(B\.13\)πtg\\displaystyle\\pi\_\{t\}^\{g\}=πtg,obs,\\displaystyle=\\pi\_\{t\}^\{g,\\mathrm\{obs\}\},∀t∈𝒯τRT,\\displaystyle\\forall t\\in\\mathcal\{T\}\_\{\\tau\}^\{\\mathrm\{RT\}\},\(B\.14\)bk,i,t\\displaystyle b\_\{k,i,t\}≥b¯k,i,tactive,\\displaystyle\\geq\\bar\{b\}\_\{k,i,t\}^\{\\mathrm\{active\}\},∀k∈𝒦,∀i∈ℐ,∀t∈𝒯τRT\.\\displaystyle\\forall k\\in\\mathcal\{K\},\\forall i\\in\\mathcal\{I\},\\forall t\\in\\mathcal\{T\}\_\{\\tau\}^\{\\mathrm\{RT\}\}\.\(B\.15\)Constraint \([B\.12](https://arxiv.org/html/2606.26400#A2.E12)\) sets the current battery energy of each bus equal to the observed value at the moment of re\-optimization\. Constraint \([B\.13](https://arxiv.org/html/2606.26400#A2.E13)\) updates the energy requirement of each route using the observed route\-level consumption value\. Constraint \([B\.14](https://arxiv.org/html/2606.26400#A2.E14)\) replaces the DA electricity\-price profile with the updated price profile used from the current interval onward\. Constraint \([B\.15](https://arxiv.org/html/2606.26400#A2.E15)\) preserves trips that are already underway, preventing the RT optimizer from interrupting active service commitments\.

If no re\-optimization is needed, the system continues executing the last accepted reference schedule𝐬ref\\mathbf\{s\}\_\{\\mathrm\{ref\}\}\. Once a revised schedule is accepted by the Evaluator Agent, it becomes the new reference𝐬τRT,∗\\mathbf\{s\}\_\{\\tau\}^\{\\mathrm\{RT\},\*\}for subsequent RT updates\.

### B\.3Agentic Supervisory Flow

This section provides the formal mathematical definitions of the agent policies and the connection between agent\-generated price guidance and the PTO scheduling model\. In a classical aggregator design \(e\.g\.,\[[23](https://arxiv.org/html/2606.26400#bib.bib7)\]\), a dedicated upper\-level problem would maximize aggregator revenueΠAgg\\Pi^\{\\mathrm\{Agg\}\}by directly optimizing the tariff vectors𝝆\+\\boldsymbol\{\\rho\}^\{\+\}and𝝆−\\boldsymbol\{\\rho\}^\{\-\}offered to the PTO:

ΠAgg=∑t∈𝒯\[ρp​\(t\)\+​wt\+​Δ​t−ρp​\(t\)−​wt−​Δ​t−πtg​\(wt\+−wt−\)​Δ​t\]\.\\Pi^\{\\mathrm\{Agg\}\}=\\sum\_\{t\\in\\mathcal\{T\}\}\\left\[\\rho\_\{p\(t\)\}^\{\+\}w\_\{t\}^\{\+\}\\Delta t\-\\rho\_\{p\(t\)\}^\{\-\}w\_\{t\}^\{\-\}\\Delta t\-\\pi\_\{t\}^\{g\}\\\!\\left\(w\_\{t\}^\{\+\}\-w\_\{t\}^\{\-\}\\right\)\\Delta t\\right\]\.\(B\.16\)In the proposed framework, this optimization problem is replaced by the agentic supervisory layer:

max𝝆\+,𝝆−⁡ΠAgg⏟classical aggregator upper level⟹\(πθTT,πθPP,πθEE\)⏟agentic supervisory layer,\\underbrace\{\\max\_\{\\boldsymbol\{\\rho\}^\{\+\},\\boldsymbol\{\\rho\}^\{\-\}\}\\Pi^\{\\mathrm\{Agg\}\}\}\_\{\\text\{classical aggregator upper level\}\}\\quad\\Longrightarrow\\quad\\underbrace\{\\left\(\\pi\_\{\\theta\_\{T\}\}^\{T\},\\pi\_\{\\theta\_\{P\}\}^\{P\},\\pi\_\{\\theta\_\{E\}\}^\{E\}\\right\)\}\_\{\\text\{agentic supervisory layer\}\},\(B\.17\)whereπθTT\\pi\_\{\\theta\_\{T\}\}^\{T\},πθPP\\pi\_\{\\theta\_\{P\}\}^\{P\}, andπθEE\\pi\_\{\\theta\_\{E\}\}^\{E\}denote the Trigger Agent, Pricing Agent, and Evaluator Agent policies, respectively\. The aggregator function is therefore redistributed across these specialized agents, which determine when the optimizer should be called, which price guidance should be tested, and whether the resulting schedule should be accepted\. The Pricing Agent maps the current decision context and coordination mode to a price\-guidance vector:

𝐮τ=πθPP​\(𝐜τ,m\),\\mathbf\{u\}\_\{\\tau\}=\\pi\_\{\\theta\_\{P\}\}^\{P\}\\left\(\\mathbf\{c\}\_\{\\tau\},m\\right\),\(B\.18\)where the decision context𝐜τ\\mathbf\{c\}\_\{\\tau\}is:

𝐜τ=Ψ​\(𝐟τ,𝐠τ,𝐡τ,m\),\\mathbf\{c\}\_\{\\tau\}=\\Psi\\\!\\left\(\\mathbf\{f\}\_\{\\tau\},\\,\\mathbf\{g\}\_\{\\tau\},\\,\\mathbf\{h\}\_\{\\tau\},\\,m\\right\),\(B\.19\)with𝐟τ\\mathbf\{f\}\_\{\\tau\}the fleet\-state vector,𝐠τ\\mathbf\{g\}\_\{\\tau\}the grid and market vector,𝐡τ\\mathbf\{h\}\_\{\\tau\}the historical memory vector, andmmthe coordination mode\. The guidance vector𝐮τ\\mathbf\{u\}\_\{\\tau\}is converted into the tariff vectors fed to the PTO scheduling model as:

ρp\+=αp\+​π¯pg,ρp−=αp−​π¯pg,∀p∈𝒫,\\rho\_\{p\}^\{\+\}=\\alpha\_\{p\}^\{\+\}\\bar\{\\pi\}\_\{p\}^\{g\},\\qquad\\rho\_\{p\}^\{\-\}=\\alpha\_\{p\}^\{\-\}\\bar\{\\pi\}\_\{p\}^\{g\},\\qquad\\forall p\\in\\mathcal\{P\},\(B\.20\)whereαp\+\\alpha\_\{p\}^\{\+\}andαp−\\alpha\_\{p\}^\{\-\}are the buy and sell multipliers proposed by the Pricing Agent, andπ¯pg\\bar\{\\pi\}\_\{p\}^\{g\}is the average grid price in tariff periodpp\. In the DA workflow, given the price guidance𝐮τ\\mathbf\{u\}\_\{\\tau\}from Eq\. \([B\.18](https://arxiv.org/html/2606.26400#A2.E18)\), the Evaluator Agent assesses the resulting schedule:

\(aτ,rτ,γτ\)=πθEE​\(𝐜τ,𝐮τ\),\\left\(a\_\{\\tau\},\\,r\_\{\\tau\},\\,\\gamma\_\{\\tau\}\\right\)=\\pi^\{E\}\_\{\\theta\_\{E\}\}\\\!\\left\(\\mathbf\{c\}\_\{\\tau\},\\,\\mathbf\{u\}\_\{\\tau\}\\right\),\(B\.21\)whereaτ∈\{accept,rerun\}a\_\{\\tau\}\\in\\\{\\texttt\{accept\},\\texttt\{rerun\}\\\}is the evaluation decision,rτr\_\{\\tau\}is the supervisory rationale, andγτ∈\[0,1\]\\gamma\_\{\\tau\}\\in\[0,1\]is a confidence score\. In the RT workflow, the Trigger Agent policy is evaluated first:

δτ=πθTT​\(𝐜τ,𝐬τ−1∗\),\\delta\_\{\\tau\}=\\pi^\{T\}\_\{\\theta\_\{T\}\}\\\!\\left\(\\mathbf\{c\}\_\{\\tau\},\\,\\mathbf\{s\}^\{\*\}\_\{\\tau\-1\}\\right\),\(B\.22\)whereδτ∈\{skip,optimize\}\\delta\_\{\\tau\}\\in\\\{\\texttt\{skip\},\\texttt\{optimize\}\\\}is the trigger decision and𝐬τ−1∗\\mathbf\{s\}^\{\*\}\_\{\\tau\-1\}is the last accepted plan\. Ifδτ=optimize\\delta\_\{\\tau\}=\\texttt\{optimize\}, the Pricing Agent and Evaluator Agent are invoked as in Eqs\. \([B\.18](https://arxiv.org/html/2606.26400#A2.E18)\) and \([B\.21](https://arxiv.org/html/2606.26400#A2.E21)\)\. In both workflows, the optimization engine solves:

𝐬τ∗∈arg⁡min𝐬∈𝒮​\(𝐮τ\)⁡CPTO​\(𝐮τ,𝐬\),\\mathbf\{s\}\_\{\\tau\}^\{\*\}\\in\\arg\\min\_\{\\mathbf\{s\}\\,\\in\\,\\mathcal\{S\}\(\\mathbf\{u\}\_\{\\tau\}\)\}C^\{\\mathrm\{PTO\}\}\\\!\\left\(\\mathbf\{u\}\_\{\\tau\},\\,\\mathbf\{s\}\\right\),\(B\.23\)connecting the agentic price guidance directly to the feasible scheduling decision defined in Eq\. \([B\.1](https://arxiv.org/html/2606.26400#A2.E1)\)\. In the RT case this specializes to Eq\. \([B\.11](https://arxiv.org/html/2606.26400#A2.E11)\), where the feasible set expands to𝒮τRT​\(𝐮τ,𝐱τobs\)\\mathcal\{S\}\_\{\\tau\}^\{\\mathrm\{RT\}\}\(\\mathbf\{u\}\_\{\\tau\},\\mathbf\{x\}\_\{\\tau\}^\{\\mathrm\{obs\}\}\)to incorporate the observed state\.

## Appendix Appendix CCompact Prompt Templates

This appendix reports compact versions of the prompt templates used in the agent experiments\. The original prompts contain implementation placeholders, repeated formatting instructions, and full multiplier/output\-schema constraints; the templates below preserve the operational content needed to reproduce the experimental logic without reproducing the complete workflow text\. The placeholders\{mode\},\{spot\_prices\},\{fleet\_context\}, and\{optimization\_result\}denote values injected by the workflow at runtime\.

Table Appendix C\.1:Prompt paradigms used for the DA Pricing Agent experiments\.AcronymNameAgentCompact definitionZSZero\-shotDA Pricing AgentRole, economic framework, mode objective, hard constraints, and structured output requirements are provided, but no worked examples or explicit reasoning scaffold is included\.CoTChain\-of\-thoughtDA Pricing AgentThe zero\-shot prompt is augmented with an explicit reasoning scaffold that asks the agent to reason about price levels, V2G opportunity, tariff spread, and the selected profit\-based or operational\-based objective before returning the final multiplier vectors\.FSFew\-shotDA Pricing AgentThe zero\-shot prompt is augmented with compact examples that illustrate mode\-consistent tariff behavior for profit\-based and operational\-based pricing\.FS\+CoTFew\-shot \+ chain\-of\-thoughtDA Pricing AgentThe prompt combines examples with the reasoning scaffold, giving the agent both behavioral demonstrations and a structured path for translating the objective into buy/sell multipliers\.RT\-ZS\-SZero\-shot with constrained schemaRT Trigger, Pricing, and Evaluator AgentsRT agents use role prompts and structured context without DA prompt\-paradigm variants; outputs are constrained to workflow fields such as action, trigger type, confidence, accepted update, and multiplier vectors\.

DA Pricing Agent zero\-shot template\.

> You are the Electric Bus Aggregator \(EBA\) price\-setting agent for DA optimization\. Set buy and sell multipliers for 48 half\-hour timesteps\. The EBA buys from the grid at spot pricePtP\_\{t\}, sells charging energy to the PTO atmtb​u​y​Ptm^\{buy\}\_\{t\}P\_\{t\}, buys V2G energy from the PTO atmts​e​l​l​Ptm^\{sell\}\_\{t\}P\_\{t\}, and resells V2G energy to the grid atPtP\_\{t\}\. Aggregator profit is the sum of charging margin and V2G margin\. If\{mode\}is profit\-based, maximize aggregator revenue while preserving feasible fleet operation and some V2G opportunity\. If\{mode\}is operational\-based, minimize PTO cost while preserving positive aggregator revenue and feasible operation\. Respect hard constraints: return exactly 48 buy multipliers and 48 sell multipliers; keepmtb​u​y\>1m^\{buy\}\_\{t\}\>1,mts​e​l​l<1m^\{sell\}\_\{t\}<1; use the price forecast\{spot\_prices\}and fleet context\{fleet\_context\}; output only the requested structured fields\.

Chain\-of\-thought extension\.

> Before producing the final multiplier vectors, reason through the coupled tariff decision: identify low\- and high\-price periods; determine when charging should be encouraged or discouraged; determine when V2G export is valuable; check whether buy multipliers are too high to create surplus SOC; check whether sell multipliers are too low to induce V2G; then adapt the multipliers to the selected profit\-based or operational\-based objective\. Return only the final structured multiplier output, not the reasoning text\.

Few\-shot extension\.

> Use the examples as behavioral guidance\. A profit\-based example raises buy multipliers and lowers sell multipliers enough to increase aggregator margin, but avoids eliminating V2G volume\. An operational\-based example keeps buy multipliers close to the spot price and sell multipliers high enough to transfer more V2G value to the PTO, while preserving small positive aggregator revenue\. Generalize the pattern to the provided price profile and fleet context rather than copying the example values\.

Few\-shot plus chain\-of\-thought template\.

> Combine the few\-shot behavioral examples with the reasoning scaffold\. First use the examples to identify the intended mode behavior; then reason about price periods, charging incentives, V2G incentives, margin, PTO exposure, and feasibility; finally return exactly 48 buy and sell multipliers in the required structured format\. The examples define the behavioral direction, while the reasoning scaffold adapts that direction to the current price and fleet context\.

DA Evaluator Agent compact template\.

> Evaluate the DA optimization result\{optimization\_result\}under the selected mode\. Inspect PTO daily cost, aggregator revenue, bought energy, sold energy, feasibility status, terminal SOC, and whether the result improves on the best known solution\. In profit\-based mode, accept results that improve or preserve aggregator revenue without infeasibility or excessive loss of V2G opportunity\. In operational\-based mode, accept results that reduce PTO cost while preserving feasibility and non\-negative aggregator revenue\. If the result should be revised, return adjusted guidance with exactly 48 buy and 48 sell multipliers; otherwise return an accept decision and rationale in the required structured format\.

RT zero\-shot constrained\-output template\.

> At each RT timestep, read the current system state, active reference plan, price deviation, energy deviation, delay status, previous trigger history, and remaining horizon\. The Trigger Agent returns a schema\-constrained decision: skip or optimize, trigger type, confidence, flagged buses, and rationale\. If optimization is triggered, the Pricing Agent returns updated buy/sell guidance for the remaining horizon using the current mode objective\. The Evaluator Agent then accepts, rejects, or requests a bounded rerun based on feasibility, PTO cost, aggregator revenue, V2G activity, flagged\-bus deviations, and terminal SOC\. These RT prompts are zero\-shot role prompts with constrained output fields rather than the ZS/CoT/FS/FS\+CoT DA prompt variants\.

## References

- \[1\]\(2019\)Vehicle\-to\-grid aggregator to support power grid and reduce electric vehicle charging cost\.IEEE Access7,pp\. 178528–178538\.Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p4.1)\.
- \[2\]G\. Antonesi, T\. Cioara, I\. Anghel, V\. Michalakopoulos, E\. Sarmas, and L\. Toderean\(2025\)From transformers to large language models: a systematic review of ai applications in the energy sector towards agentic digital twins\.arXiv preprint arXiv:2506\.06359\.Cited by:[§2\.3](https://arxiv.org/html/2606.26400#S2.SS3.p1.1)\.
- \[3\]Z\. Bao, J\. Li, X\. Bai, C\. Xie, Z\. Chen, M\. Xu, W\. Shang, and H\. Li\(2023\)An optimal charging scheduling model and algorithm for electric buses\.Applied Energy332,pp\. 120512\.Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[4\]K\. Bruninx, H\. Pandžić, H\. Le Cadre, and E\. Delarue\(2019\)On the interaction between aggregators, electricity markets and residential demand response providers\.IEEE Transactions on Power Systems35\(2\),pp\. 840–853\.Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p4.1)\.
- \[5\]S\. Burger, J\. P\. Chaves\-Ávila, C\. Batlle, and I\. J\. Pérez\-Arriaga\(2017\)A review of the value of aggregators in electricity systems\.Renewable and Sustainable Energy Reviews77,pp\. 395–405\.External Links:[Document](https://dx.doi.org/10.1016/j.rser.2017.04.014)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p3.1)\.
- \[6\]Y\. Cao, L\. Huang, Y\. Li, K\. Jermsittiparsert, H\. Ahmadi\-Nezamabad, and S\. Nojavan\(2020\)Optimal scheduling of electric vehicles aggregator under market price uncertainty using robust optimization technique\.International Journal of Electrical Power & Energy Systems117,pp\. 105628\.Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[7\]A\. M\. Carreiro, H\. M\. Jorge, and C\. H\. Antunes\(2017\)Energy management systems aggregators: a literature survey\.Renewable and Sustainable Energy Reviews73,pp\. 1160–1172\.External Links:[Document](https://dx.doi.org/10.1016/j.rser.2017.01.179)Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[8\]J\. Chen and K\. Strunz\(2025\-02\)Optimal Electric Bus Charging and Battery Swapping With Renewable Energy and Frequency Control Ancillary Service Through Aggregator\.IEEE Transactions on Transportation Electrification11\(1\),pp\. 3715–3729\.External Links:ISSN 2332\-7782,[Link](https://ieeexplore.ieee.org/document/10638650/),[Document](https://dx.doi.org/10.1109/TTE.2024.3445830)Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[9\]J\. Clairand, M\. González\-Rodríguez, I\. Cedeño, and G\. Escrivá\-Escrivá\(2022\)A charging station planning model considering electric bus aggregators\.Sustainable Energy, Grids and Networks30,pp\. 100638\.Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[10\]A\. Eslami and J\. Yu\(2025\)Security risks of agentic vehicles: a systematic analysis of cognitive and cross\-layer threats\.arXiv preprint arXiv:2512\.17041\.Cited by:[§6](https://arxiv.org/html/2606.26400#S6.p5.1)\.
- \[11\]A\. Eslami and J\. Yu\(2026\-03\)A Control\-Theoretic Foundation for Agentic Systems\.arXiv\.Note:arXiv:2603\.10779 \[eess\.SY\]External Links:[Link](http://arxiv.org/abs/2603.10779),[Document](https://dx.doi.org/10.48550/arXiv.2603.10779)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p7.1)\.
- \[12\]A\. Eslami and J\. Yu\(2026\)Stability without safety: gain manipulation attacks on agentic cyber\-physical systems\.arXiv preprint arXiv:2606\.07803\.Cited by:[§6](https://arxiv.org/html/2606.26400#S6.p5.1)\.
- \[13\]R\. Faia, B\. Ribeiro, C\. Goncalves, L\. Gomes, and Z\. Vale\(2023\-10\)Multi\-agent based energy community cost optimization considering high electric vehicles penetration\.Sustainable Energy Technologies and Assessments59,pp\. 103402\(en\)\.External Links:ISSN 22131388,[Link](https://linkinghub.elsevier.com/retrieve/pii/S2213138823003958),[Document](https://dx.doi.org/10.1016/j.seta.2023.103402)Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[14\]M\. Gallet, T\. Massier, and T\. Hamacher\(2018\)Estimation of the energy demand of electric buses based on real\-world data for large\-scale public transport networks\.Applied Energy230,pp\. 344–356\.External Links:[Document](https://dx.doi.org/10.1016/j.apenergy.2018.08.086)Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[15\]L\. Gkatzikis, I\. Koutsopoulos, and T\. Salonidis\(2013\)The role of aggregators in smart grid demand response markets\.IEEE Journal on selected areas in communications31\(7\),pp\. 1247–1257\.Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p3.1)\.
- \[16\]X\. Hu, H\. Li, and C\. Xie\(2025\)Optimal charging scheduling of an electric bus fleet with photovoltaic\-storage\-charging stations\.Applied Energy390,pp\. 125714\.Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[17\]International Energy Agency\(2024\)Global ev outlook 2024\.Technical reportInternational Energy Agency\.External Links:[Link](https://www.iea.org/reports/global-ev-outlook-2024)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p2.1)\.
- \[18\]B\. Ke, Y\. Lin, H\. Chen, and S\. Fang\(2020\)Battery charging and discharging scheduling with demand response for an electric bus public transportation system\.Sustainable Energy Technologies and Assessments40,pp\. 100741\.Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[19\]X\. Liu, X\. Qu, and X\. Ma\(2021\)Optimizing electric bus charging infrastructure considering power matching and seasonality\.Transportation Research Part D: Transport and Environment100,pp\. 103057\.External Links:[Document](https://dx.doi.org/10.1016/j.trd.2021.103057)Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[20\]S\. Majumder, L\. Dong, F\. Doudi, Y\. Cai, C\. Tian, D\. Kalathil, K\. Ding, A\. A\. Thatte, N\. Li, and L\. Xie\(2024\)Exploring the capabilities and limitations of large language models in the electric energy sector\.Joule8\(6\),pp\. 1544–1549\.Cited by:[§2\.3](https://arxiv.org/html/2606.26400#S2.SS3.p1.1)\.
- \[21\]J\. A\. Manzolli, J\. P\. Trovão, and C\. H\. Antunes\(2022\-05\)A review of electric bus vehicles research topics – Methods and trends\.Renewable and Sustainable Energy Reviews159,pp\. 112211\(en\)\.External Links:ISSN 13640321,[Link](https://linkinghub.elsevier.com/retrieve/pii/S1364032122001344),[Document](https://dx.doi.org/10.1016/j.rser.2022.112211)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p2.1)\.
- \[22\]J\. A\. Manzolli, J\. P\. Trovão, and C\. H\. Antunes\(2022\-10\)Electric bus smart charging under a bi\-level optimisation model to set dynamic tariffs\.InIECON 2022 – 48th Annual Conference of the IEEE Industrial Electronics Society,pp\. 1–6\.External Links:ISSN 2577\-1647,[Document](https://dx.doi.org/10.1109/IECON49645.2022.9969101)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p4.1)\.
- \[23\]J\. A\. Manzolli, J\. P\. F\. Trovão, and C\. H\. Antunes\(2024\)Aggregator\-supported strategy for electric bus fleet charging: a hierarchical optimisation approach\.Energy307,pp\. 132497\.External Links:[Document](https://dx.doi.org/10.1016/j.energy.2024.132497)Cited by:[§B\.3](https://arxiv.org/html/2606.26400#A2.SS3.p1.3),[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[24\]J\. A\. Manzolli, J\. P\. F\. Trovão, and C\. H\. Antunes\(2025\)Electric bus fleet charging management: a robust optimisation framework addressing battery ageing, time\-of\-use tariffs, and energy consumption uncertainty\.Applied Energy381,pp\. 125137\.External Links:[Document](https://dx.doi.org/10.1016/j.apenergy.2024.125137)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p2.1)\.
- \[25\]J\. A\. Manzolli, J\. P\. Trovão, and C\. Henggeler Antunes\(2022\-11\)Optimisation of an electric bus charging strategy considering a semi\-empirical battery degradation model and weather conditions\.In2022 11th International Conference on Control, Automation and Information Sciences \(ICCAIS\),Hanoi, Vietnam,pp\. 298–303\(en\)\.External Links:ISBN 978\-1\-6654\-5248\-9,[Link](https://ieeexplore.ieee.org/document/9990180/),[Document](https://dx.doi.org/10.1109/ICCAIS56082.2022.9990180)Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[26\]J\. A\. Manzolli, J\. Yu, and L\. Miranda\-Moreno\(2025\-05\)Synthetic multi\-criteria decision analysis \(S\-MCDA\): A new framework for participatory transportation planning\.Transportation Research Interdisciplinary Perspectives31,pp\. 101463\(en\)\.External Links:ISSN 25901982,[Link](https://linkinghub.elsevier.com/retrieve/pii/S2590198225001423),[Document](https://dx.doi.org/10.1016/j.trip.2025.101463)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p7.1)\.
- \[27\]B\. Naeimian, G\. Mohseni, V\. Barzegari, M\. Nourinejad, and P\. Y\. Park\(2025\)Public transportation fleet electrification and charger schedule optimization using a decomposition heuristic\.Energy333,pp\. 137135\.Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[28\]M\. Pagliaro and F\. Meneguzzo\(2019\)Electric bus: a critical overview on the dawn of its widespread uptake\.Advanced Sustainable Systems3\(6\),pp\. 1800151\.External Links:[Document](https://dx.doi.org/10.1002/adsu.201800151)Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[29\]O\. Sadeghian, A\. Oshnoei, B\. Mohammadi\-Ivatloo, V\. Vahidinasab, and A\. Anvari\-Moghaddam\(2022\)A comprehensive review on electric vehicles smart charging: solutions, strategies, technologies, and challenges\.Journal of Energy Storage54,pp\. 105241\.Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p2.1)\.
- \[30\]H\. Shi, L\. Fang, X\. Chen, C\. Gu, K\. Ma, X\. Zhang, Z\. Zhang, J\. Gu, and E\. G\. Lim\(2024\)Review of the opportunities and challenges to accelerate mass\-scale application of smart grids with large\-language models\.IET Smart Grid7\(6\),pp\. 737–759\.External Links:[Document](https://dx.doi.org/10.1049/stg2.12191)Cited by:[§2\.3](https://arxiv.org/html/2606.26400#S2.SS3.p1.1)\.
- \[31\]X\. Tian\(2025\-08\)Integrated Analysis and Modeling of Energy Demand, Emissions, and Lifecycle Impacts of Bus Electrification Under Low\-Carbon Electricity Scenarios\.PhD Thesis,Concordia University, \(en\)\.External Links:[Link](https://spectrum.library.concordia.ca/id/eprint/996169/)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p4.1)\.
- \[32\]J\. Wang, L\. Kang, and Y\. Liu\(2020\)Optimal scheduling for electric bus fleets based on dynamic programming approach by considering battery capacity fade\.Renewable and Sustainable Energy Reviews130,pp\. 109978\.Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[33\]Y\. Wang, F\. Liao, and C\. Lu\(2022\)Integrated optimization of charger deployment and fleet scheduling for battery electric buses\.Transportation Research Part D: Transport and Environment109,pp\. 103382\.External Links:[Document](https://dx.doi.org/10.1016/j.trd.2022.103382)Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[34\]Ş\. Yıldırım and B\. Yıldız\(2021\)Electric bus fleet composition and scheduling\.Transportation Research Part C: Emerging Technologies129,pp\. 103197\.External Links:[Document](https://dx.doi.org/10.1016/j.trc.2021.103197)Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[35\]J\. Yu, R\. Frank, L\. Miranda\-Moreno, S\. Jafarnejad, J\. A\. Manzolli, F\. Liu, J\. Wang, and A\. Eslami\(2025\)Agentic vehicles for human\-centered mobility: definition, prospects, and synergistic co\-development with vehicle autonomy\.arXiv preprint:2507\.04996\.Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p7.1)\.
- \[36\]J\. Yu\(2025\)Preparing for an agentic era of human\-machine transportation systems: opportunities, challenges, and policy recommendations\.Transport Policy171,pp\. 78–97\.Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p7.1)\.
- \[37\]C\. Zhang, J\. Zhang, J\. Lu, and Y\. Zhao\(2026\)Large language models meet energy systems: opportunities, challenges, and future perspectives\.Applied Energy403,pp\. 127076\.External Links:[Document](https://dx.doi.org/10.1016/j.apenergy.2025.127076)Cited by:[§2\.3](https://arxiv.org/html/2606.26400#S2.SS3.p1.1)\.
- \[38\]L\. Zhang, Y\. Han, J\. Peng, and Y\. Wang\(2023\)Vehicle and charging scheduling of electric bus fleets: a comprehensive review\.Journal of Intelligent and Connected Vehicles6\(3\),pp\. 116–124\.External Links:[Document](https://dx.doi.org/10.26599/JICV.2023.9210012)Cited by:[§2\.2](https://arxiv.org/html/2606.26400#S2.SS2.p1.1)\.
- \[39\]L\. Zhao, S\. Shen, and Z\. Zhao\(2025\-09\)Large\-scale electric bus network transition planning via deep reinforcement learning\.Transportation Research Part D: Transport and Environment146,pp\. 104899\(en\)\.External Links:ISSN 1361\-9209,[Link](https://linkinghub.elsevier.com/retrieve/pii/S1361920925003098),[Document](https://dx.doi.org/10.1016/j.trd.2025.104899)Cited by:[§1](https://arxiv.org/html/2606.26400#S1.p4.1)\.
- \[40\]Y\. Zhou, Q\. Meng, G\. P\. Ong, and H\. Wang\(2024\)Electric bus charging scheduling on a bus network\.Transportation Research Part C: Emerging Technologies161,pp\. 104553\.Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.
- \[41\]Y\. Zhou, G\. P\. Ong, Q\. Meng, and H\. Cui\(2023\)Electric bus charging facility planning with uncertainties: model formulation and algorithm design\.Transportation Research Part C: Emerging Technologies150,pp\. 104108\.External Links:[Document](https://dx.doi.org/10.1016/j.trc.2023.104108)Cited by:[§2\.1](https://arxiv.org/html/2606.26400#S2.SS1.p1.1)\.

Similar Articles

Can urban economics help model and improve agentic AI systems?

Reddit r/ArtificialInteligence

This article explores how concepts from urban economics, such as traffic, zoning, and pollution, can model externalities in agentic AI systems. It introduces a Behavioral Externality Multiplier (BEM) and proposes a layered framework involving architecture, substrate, and governance to measure and mitigate costly consequences of cheap AI actions.

Agent Marketplace

Reddit r/AI_Agents

Discusses the unsolved pain points in shipping AI agents to production and explores the idea of an agent marketplace where discrete units of work are sold, with standardized I/O and shared evaluations.

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

Hugging Face Daily Papers

Introduces Agent Bazaar, a multi-agent simulation framework for evaluating economic alignment of LLMs, identifying failure modes like algorithmic instability and Sybil deception, and training a 9B model that outperforms frontier models using targeted reinforcement learning.

How are you actually saving cost on your agent systems?

Reddit r/AI_Agents

The article discusses the challenges of cost optimization and FinOps for AI agent systems, highlighting issues with unpredictable token bills, lack of granular attribution tools, and strategies like caching and hard caps.