Skill-Constrained Model Predictive Control for Resilient Manufacturing Supply Chains
Summary
This paper presents a skill-constrained model predictive control approach for resilient manufacturing supply chains, where training decisions affect future certified capacity. The controller solves a finite-horizon mixed-integer program and is evaluated on synthetic scenarios, showing that predictive control helps when bottlenecks are forecastable but is not universally superior.
View Cached Full Text
Cached at: 06/17/26, 05:35 AM
# Skill-Constrained Model Predictive Control for Resilient Manufacturing Supply Chains
Source: [https://arxiv.org/html/2606.17269](https://arxiv.org/html/2606.17269)
Carlos Eduardo Sanoja Quanta Labs, LLC Professor, FCEA, Universidad Monteávila Edificio Lomas del Sol, Calle Humboldt, Lomas del Sol, Caracas, Venezuela csanoja@somosquanta\.com ORCID:[0009\-0000\-0339\-7072](https://orcid.org/0009-0000-0339-7072)
###### Abstract
In skill\-constrained production\-inventory systems, the qualified human capacity available tomorrow depends on training decisions made today: production requires certified workers, certifications decay unless maintained, and training consumes the same scarce worker hours that production needs now\. We study a closed\-loop skill\-constrained model predictive controller that, at every shift, solves a finite\-horizon mixed\-integer program over production, inventory, backlog, and training, with binary predicted certification, hard production eligibility, and an interpretable terminal value that prices certified\-capacity gaps at the horizon boundary; only the first\-period action is applied before replanning\. On synthetic, seed\-controlled SkillChain\-Gym scenarios — announced and surprise new\-skill shocks, demand shocks, absenteeism, forecast\- and availability\-quality modes, capacity\-boundary and training\-rate sweeps, and negative controls — we evaluate the controller against production\-only and maintenance\-only ablations, static cross\-training insurance plans, and a strong reactive heuristic, under an ex\-ante locked configuration and paired statistics\. The result is regime dependence, not superiority: no policy class dominates\. Predictive control helps when skill or labor bottlenecks are forecastable early enough for training to complete; lean static insurance remains hard to beat under surprise shocks, near the demand–capacity boundary, and wherever pre\-shock slack makes insurance cheap\. Attribution ablations separate certification maintenance, re\-acquisition of lapsed certifications, and greenfield skill acquisition\. Forecastability, not adaptivity per se, decides when predictive control pays\.
## 1Introduction
Production and service systems are usually planned as if labor capacity were exogenous: workers appear as a fixed resource, and the planning question is how to allocate materials, machines, and inventory around them\. In skill\-constrained operations this abstraction fails in a specific way — the qualified capacity available tomorrow depends on training and certification decisions made today\. Workforce planning research has long recognized skills, cross\-training, and learning as first\-class modeling objects\[[12](https://arxiv.org/html/2606.17269#bib.bib11),[42](https://arxiv.org/html/2606.17269#bib.bib42)\], and workforce reconfiguration is increasingly treated as a resilience lever in manufacturing\[[18](https://arxiv.org/html/2606.17269#bib.bib18),[32](https://arxiv.org/html/2606.17269#bib.bib34)\]\. When production requires certified workers and certifications are dynamic, qualified human capacity — not machines or materials — can be the binding operational resource\. Reskilling then stops being a background human\-resources decision and becomes a control action\.
What makes the resulting control problem hard is a tight intertemporal coupling\. Training consumes the same scarce worker hours that production needs now, so building future capability always costs current output\[[21](https://arxiv.org/html/2606.17269#bib.bib21),[26](https://arxiv.org/html/2606.17269#bib.bib26)\]\. Skills decay when unused, so certification is a maintained asset rather than a one\-off purchase\[[27](https://arxiv.org/html/2606.17269#bib.bib28),[5](https://arxiv.org/html/2606.17269#bib.bib5),[29](https://arxiv.org/html/2606.17269#bib.bib30)\]\. Disruptions interact with both mechanisms: demand spikes and absences stress existing certified capacity, while new\-product introductions can require a skill that no worker currently holds\. Whether such shocks are forecast, announced late, or hidden until onset changes the problem fundamentally, because training has a lag — reacting after a shock can be structurally too late when post\-shock demand leaves no slack for diverting hours into training\.
Each ingredient of this problem is well studied in isolation\. Model predictive and receding\-horizon control of inventories, production, and supply chains is a mature field\[[6](https://arxiv.org/html/2606.17269#bib.bib6),[31](https://arxiv.org/html/2606.17269#bib.bib32),[13](https://arxiv.org/html/2606.17269#bib.bib13),[28](https://arxiv.org/html/2606.17269#bib.bib29),[37](https://arxiv.org/html/2606.17269#bib.bib38)\], but it treats labor as exogenous or absent\. Workforce planning with skills, training, learning, and forgetting is likewise mature\[[12](https://arxiv.org/html/2606.17269#bib.bib11),[34](https://arxiv.org/html/2606.17269#bib.bib35),[40](https://arxiv.org/html/2606.17269#bib.bib40)\], but these models are predominantly open\-loop planning formulations rather than closed\-loop controllers\. The closest control\-side work models human activity\-time uncertainty or machine\-capability “skills” inside MPC\[[35](https://arxiv.org/html/2606.17269#bib.bib36),[44](https://arxiv.org/html/2606.17269#bib.bib44)\]without worker skill evolution or training actions\. We therefore make no novelty claim for supply\-chain MPC or for workforce\-training models; the contribution is their closed\-loop combination: a receding\-horizon controller in which worker skill levels are observed dynamic states and training is an online, capacity\-consuming control action coupled to inventory and backlog dynamics\.
Concretely, we formulate skill\-constrained production\-inventory control on the SkillChain\-Gym benchmark of our companion paper \(Section[3](https://arxiv.org/html/2606.17269#S3)\): continuous skill levels with hard threshold certification, production eligibility requiring certification, geometric forgetting, and a shared per\-worker time budget for production and training\. The controller \(Section[4](https://arxiv.org/html/2606.17269#S4)\) solves, at every shift, a finite\-horizon mixed\-integer program with binary predicted certification and an interpretable terminal skill\-bottleneck value that prices certified capacity gaps left open at the horizon boundary, applies only the first\-period action, and replans\. Attribution ablations isolate what the skill machinery is worth: a production\-only controller, a maintenance\-only controller that can preserve but never acquire certifications, and the full controller with and without the terminal value, evaluated against deliberately favorable static cross\-training plans and a strong reactive heuristic under an ex\-ante locked primary configuration and paired statistics \(Section[5](https://arxiv.org/html/2606.17269#S5)\)\.
The empirical picture \(Section[6](https://arxiv.org/html/2606.17269#S6)\) is a regime map, not a ranking\. No policy class dominates\. Predictive control helps when skill or labor bottlenecks are forecastable early enough for training to complete: forecast\-visible new\-skill bottlenecks, announced demand shocks where inventory anticipation combines with certification maintenance, announced absence windows, and slow\-training regimes where the terminal value prevents short\-horizon controllers from ignoring visible future gaps\. Lean static insurance remains hard to beat when shocks are hidden, when reaction transients are structurally unrecoverable near the demand–capacity boundary, and wherever pre\-shock slack makes insurance cheap\. Degrading forecast quality degrades the controller in an interpretable order, making explicit that its advantages are forecast leverage rather than generic adaptivity\.
Our contributions are:
1. 1\.a closed\-loop skill\-constrained MPC formulation for single\-site production\-inventory systems with dynamic worker skill states, hard certification, forgetting, and capacity\-consuming training;
2. 2\.a mixed\-integer implementation with an interpretable terminal skill\-bottleneck value and a variant/ablation chain separating production\-only control, certification maintenance, no\-terminal MPC, and full skill\-aware MPC;
3. 3\.a reproducible, deterministic experiment suite on the SkillChain\-Gym simulator covering announced and surprise new\-skill shocks, demand shocks, absenteeism, forecast and availability\-forecast quality, capacity\-boundary sweeps, training\-rate sensitivity, and negative controls; and
4. 4\.a mechanism\-level regime analysis that separates certification maintenance, lapsed\-certification re\-acquisition, and greenfield skill acquisition with within\-episode certification\-event counters, and shows that forecastability — of demand, of new skill requirements, and of labor availability — decides when predictive control outperforms static insurance\.
## 2Related Work
### 2\.1Model predictive control for production\-inventory and supply\-chain systems
Model predictive control \(MPC\) has a long history in production\-inventory and supply\-chain planning\. Early work showed how receding\-horizon optimization can manage multi\-product, multi\-echelon demand networks and supply\-chain profit objectives under capacity, storage, production, and shipment constraints\[[6](https://arxiv.org/html/2606.17269#bib.bib6),[7](https://arxiv.org/html/2606.17269#bib.bib7),[31](https://arxiv.org/html/2606.17269#bib.bib32)\]\. Reviews of control\-theoretic production\-inventory and supply\-chain models further establish that inventory, backlog, demand amplification, ordering, and material\-flow dynamics have been studied extensively through control methods\[[30](https://arxiv.org/html/2606.17269#bib.bib31),[36](https://arxiv.org/html/2606.17269#bib.bib37)\]\. Subsequent work developed centralized, robust, scenario\-based, and forecasting\-enhanced predictive controllers for supply\-chain and inventory systems\[[28](https://arxiv.org/html/2606.17269#bib.bib29),[14](https://arxiv.org/html/2606.17269#bib.bib14),[15](https://arxiv.org/html/2606.17269#bib.bib15),[37](https://arxiv.org/html/2606.17269#bib.bib38),[13](https://arxiv.org/html/2606.17269#bib.bib13),[2](https://arxiv.org/html/2606.17269#bib.bib2),[23](https://arxiv.org/html/2606.17269#bib.bib23)\]\. Recent receding\-horizon game formulations also model competitive supply chains under demand spikes and supply shocks\[[17](https://arxiv.org/html/2606.17269#bib.bib17)\]\. These papers motivate our control formulation, but generally treat productive capacity as exogenous or machine/process constrained rather than as a dynamic function of workforce skill development\.
MPC has also been applied directly to manufacturing scheduling and shop\-floor control\. Examples include multilayer MPC for semiconductor lines, fab\-wide scheduling, multi\-product job\-shop control, and flexible job\-shop scheduling\[[43](https://arxiv.org/html/2606.17269#bib.bib43),[24](https://arxiv.org/html/2606.17269#bib.bib24),[38](https://arxiv.org/html/2606.17269#bib.bib39),[44](https://arxiv.org/html/2606.17269#bib.bib44)\]\. The flexible job\-shop work of Wenzelburger and Allgower is especially close in terminology because it models task and manufacturing\-unit “skills” within an MPC scheduling framework\[[44](https://arxiv.org/html/2606.17269#bib.bib44)\]; however, those skills describe machine or manufacturing\-unit capabilities, not worker competencies that evolve through training\. Ruppert et al\. incorporate uncertain operator activity times in an MPC controller for manual assembly lines\[[35](https://arxiv.org/html/2606.17269#bib.bib36)\], but do not model worker skill acquisition or training as a control action\.
### 2\.2Workforce planning with skills, learning, and training
The operations\-research literature on workforce planning already contains extensive models of skills, skill levels, multi\-skilling, learning, forgetting, cross\-training, and human factors\. Broad reviews cover personnel scheduling, workforce planning with skills, workforce reconfiguration in manufacturing, multi\-skilling in scheduling, and human\-aware logistics/manufacturing optimization\[[42](https://arxiv.org/html/2606.17269#bib.bib42),[12](https://arxiv.org/html/2606.17269#bib.bib11),[18](https://arxiv.org/html/2606.17269#bib.bib18),[1](https://arxiv.org/html/2606.17269#bib.bib1),[32](https://arxiv.org/html/2606.17269#bib.bib34),[33](https://arxiv.org/html/2606.17269#bib.bib33),[8](https://arxiv.org/html/2606.17269#bib.bib8)\]\. Learning and forgetting in worker assignment and scheduling have also been modeled for dual\-resource systems, parallel systems, cellular manufacturing, job rotation, and workforce assignment\[[27](https://arxiv.org/html/2606.17269#bib.bib28),[45](https://arxiv.org/html/2606.17269#bib.bib45),[5](https://arxiv.org/html/2606.17269#bib.bib5),[29](https://arxiv.org/html/2606.17269#bib.bib30),[4](https://arxiv.org/html/2606.17269#bib.bib3),[10](https://arxiv.org/html/2606.17269#bib.bib10)\]\. Integer\-programming reformulations make nonlinear worker learning curves tractable in assignment and planning models\[[22](https://arxiv.org/html/2606.17269#bib.bib22),[25](https://arxiv.org/html/2606.17269#bib.bib25)\], and learning\-curve selection has been studied empirically in production economics\[[16](https://arxiv.org/html/2606.17269#bib.bib16)\]\.
Several papers are particularly important constraints on our novelty claim\. Azizi and Liang jointly optimize worker assignment, flexibility acquisition, task rotation, and training schedules in manufacturing\[[3](https://arxiv.org/html/2606.17269#bib.bib4)\]\. Valeva et al\. study workforce planning with learning, stochastic demand, and inventory as a flexibility buffer\[[40](https://arxiv.org/html/2606.17269#bib.bib40),[41](https://arxiv.org/html/2606.17269#bib.bib41)\]\. Cavagnini et al\. extend this line by modeling uncertain learning and forgetting rates with assignment, cross\-training, and practice decisions\[[9](https://arxiv.org/html/2606.17269#bib.bib9)\]\. Heuser et al\. explicitly study production workforce planning with flexible or budgeted training, volatile demand, learning\-by\-doing, and forgetting, where training consumes capacity that could otherwise be used for production\[[21](https://arxiv.org/html/2606.17269#bib.bib21)\]\. Ruf et al\. formulate hierarchical skills, long\-term training, and random resignations as a multistage workforce capacity planning problem using approximate dynamic programming\[[34](https://arxiv.org/html/2606.17269#bib.bib35)\]\. Henao et al\. model multiskilled personnel assignment with learning\-forgetting dynamics and k\-chaining policies\[[19](https://arxiv.org/html/2606.17269#bib.bib19)\], with related benchmark data for uncertain multiskilled personnel demand\[[20](https://arxiv.org/html/2606.17269#bib.bib20)\]\.
### 2\.3Integrated operations and training
Integrated planning of operations and training is not new\. De Bruecker et al\. optimize aircraft maintenance skill mix and training schedules using a three\-stage mixed\-integer approach in which training affects workforce availability\[[11](https://arxiv.org/html/2606.17269#bib.bib12)\]\. Kafiabad et al\. integrate procurement, production, inventory, and on\-job training in maintenance logistics networks\[[26](https://arxiv.org/html/2606.17269#bib.bib26)\], and later study workforce training and operations planning for maintenance centers under demand uncertainty\[[39](https://arxiv.org/html/2606.17269#bib.bib27)\]\. These are the closest prior works to the present topic because they already connect operations, inventory\-related decisions, certified operators, and training\. Their focus, however, is tactical maintenance planning through deterministic or stochastic mathematical programming, not closed\-loop MPC over observed inventory, backlog, availability, and worker skill states\.
### 2\.4Positioning of this paper
The contribution of this paper is therefore not the use of MPC in supply chains, nor the modeling of worker learning or training in isolation\. Instead, we study a closed\-loop predictive\-control formulation in which worker skills are part of the system state and training/reskilling is an explicit capacity\-consuming control action coupled to production and inventory/backlog dynamics\. This distinguishes the proposed controller from supply\-chain MPC models that omit workforce skill evolution, and from workforce\-training models that optimize over a fixed planning horizon without repeated state observation, forecast updates, and receding\-horizon response to disruptions\.
## 3Problem Formulation
We formulate the control problem on the single\-site production\-inventory system with stylized worker skill\-state dynamics defined by the SkillChain\-Gym benchmark\. The scope is deliberately narrow: one site, multiple products, a small set of skills, one aggregate production\-capacity pool, and inventory/backlog dynamics\. Procurement, supplier delays, multi\-echelon flows, job\-shop routing, soft \(graded\) productivity, and learning\-by\-doing are excluded by design, and all instances are synthetic and seed\-controlled; no real data are used\. This section defines the plant, the admissible actions, and the control objective; Section[4](https://arxiv.org/html/2606.17269#S4)describes the receding\-horizon implementation\.
#### Sets and parameters\.
Shiftst∈\{0,…,T−1\}t\\in\\\{0,\\dots,T\-1\\\}\(one period is one work shift;T=60T=60\), productsp∈𝒫p\\in\\mathcal\{P\}, workersw∈𝒲w\\in\\mathcal\{W\}, and skillsk∈𝒦k\\in\\mathcal\{K\}, with one primary skillk\(p\)k\(p\)per product\. Static parameters: productivityvp\>0v\_\{p\}\>0\(units per certified worker\-hour\), nominal worker hoursHwH\_\{w\}per shift, certification thresholdsθk∈\[0,1\]\\theta\_\{k\}\\in\[0,1\], training gainαtrain≥0\\alpha^\{\\mathrm\{train\}\}\\geq 0per training hour, forgetting ratesδk≥0\\delta\_\{k\}\\geq 0per shift, training\-seat capacitiescapktraincap^\{\\mathrm\{train\}\}\_\{k\}, aggregate production\-capacity hoursCtC\_\{t\}, and cost coefficientscpBc^\{B\}\_\{p\}\(backlog\),cpIc^\{I\}\_\{p\}\(holding\),cpqc^\{q\}\_\{p\}\(production\), andcYc^\{Y\}\(training\) per unit and shift\.
#### State\.
At shiftttthe state is\(It,Bt,D^t,At,Ct,St,Qt\)\\bigl\(I\_\{t\},B\_\{t\},\\hat\{D\}\_\{t\},A\_\{t\},C\_\{t\},S\_\{t\},Q\_\{t\}\\bigr\): inventoryIp,t≥0I\_\{p,t\}\\geq 0, backlogBp,t≥0B\_\{p,t\}\\geq 0, a demand\-forecast windowD^p,t:t\+F\\hat\{D\}\_\{p,t:t\+F\}, worker availabilityAw,t∈\[0,Hw\]A\_\{w,t\}\\in\[0,H\_\{w\}\], the capacity pool, and continuous skill levelsSw,k,t∈\[0,1\]S\_\{w,k,t\}\\in\[0,1\]from which certification is derived by hard thresholding,
Qw,k,t=1\[Sw,k,t≥θk\]\.Q\_\{w,k,t\}\\;=\\;\\mathbf\{1\}\\\!\\left\[S\_\{w,k,t\}\\geq\\theta\_\{k\}\\right\]\.\(1\)There is no partial productivity below the threshold, no separate training\-progress state, and no skill growth from production work\.
#### Actions and feasibility\.
The actionut=\(atprod,attrain\)u\_\{t\}=\(a^\{\\mathrm\{prod\}\}\_\{t\},a^\{\\mathrm\{train\}\}\_\{t\}\)allocates nonnegative worker\-hours to production \(aw,p,tproda^\{\\mathrm\{prod\}\}\_\{w,p,t\}\) and to training \(aw,k,ttraina^\{\\mathrm\{train\}\}\_\{w,k,t\}\), subject to
∑paw,p,tprod\+∑kaw,k,ttrain\\displaystyle\\sum\_\{p\}a^\{\\mathrm\{prod\}\}\_\{w,p,t\}\+\\sum\_\{k\}a^\{\\mathrm\{train\}\}\_\{w,k,t\}≤Aw,t∀w,\\displaystyle\\;\\leq\\;A\_\{w,t\}\\quad\\forall w,\(2\)aw,p,tprod=0wheneverQw,k\(p\),t=0,∑w,paw,p,tprod≤Ct,∑waw,k,ttrain\\displaystyle a^\{\\mathrm\{prod\}\}\_\{w,p,t\}=0\\;\\text\{ whenever \}\\;Q\_\{w,k\(p\),t\}=0,\\qquad\\sum\_\{w,p\}a^\{\\mathrm\{prod\}\}\_\{w,p,t\}\\;\\leq\\;C\_\{t\},\\qquad\\sum\_\{w\}a^\{\\mathrm\{train\}\}\_\{w,k,t\}≤capktrain\.\\displaystyle\\;\\leq\\;cap^\{\\mathrm\{train\}\}\_\{k\}\.\(3\)Constraint \([2](https://arxiv.org/html/2606.17269#S3.E2)\) is the central mechanism: training consumes the same scarce worker time as production, so reskilling is never free\. Production output isqp,t=vp∑waw,p,tprodQw,k\(p\),tq\_\{p,t\}=v\_\{p\}\\sum\_\{w\}a^\{\\mathrm\{prod\}\}\_\{w,p,t\}\\,Q\_\{w,k\(p\),t\}\.
#### Dynamics\.
DemandDp,tD\_\{p,t\}is realized after the action as the forecast mean plus seed\-controlled noise\. Shipments serve current demand plus backlog from inventory and current production, and skills decay geometrically while growing linearly in training time:
shipp,t\\displaystyle\\mathrm\{ship\}\_\{p,t\}=min\(Ip,t\+qp,t,Dp,t\+Bp,t\),Ip,t\+1=Ip,t\+qp,t−shipp,t,Bp,t\+1=Bp,t\+Dp,t−shipp,t,\\displaystyle=\\min\\bigl\(I\_\{p,t\}\+q\_\{p,t\},\\,D\_\{p,t\}\+B\_\{p,t\}\\bigr\),\\quad I\_\{p,t\+1\}=I\_\{p,t\}\+q\_\{p,t\}\-\\mathrm\{ship\}\_\{p,t\},\\quad B\_\{p,t\+1\}=B\_\{p,t\}\+D\_\{p,t\}\-\\mathrm\{ship\}\_\{p,t\},\(4\)Sw,k,t\+1\\displaystyle S\_\{w,k,t\+1\}=Π\[0,1\]\(\(1−δk\)Sw,k,t\+αtrainaw,k,ttrain\),Qw,k,t\+1=𝟏\[Sw,k,t\+1≥θk\]\.\\displaystyle=\\Pi\_\{\[0,1\]\}\\\!\\Bigl\(\(1\-\\delta\_\{k\}\)\\,S\_\{w,k,t\}\+\\alpha^\{\\mathrm\{train\}\}a^\{\\mathrm\{train\}\}\_\{w,k,t\}\\Bigr\),\\qquad Q\_\{w,k,t\+1\}=\\mathbf\{1\}\\\!\\left\[S\_\{w,k,t\+1\}\\geq\\theta\_\{k\}\\right\]\.\(5\)Forgetting makes certification a maintained asset rather than a one\-off purchase: an unmaintained skill eventually falls below threshold, and a lapsed certification can only be recovered by further training\.
#### Objective and information structure\.
The per\-shift cost is
ct=∑pcpBBp,t\+1\+∑pcpIIp,t\+1\+∑pcpqqp,t\+cY∑w,kaw,k,ttrain,c\_\{t\}\\;=\\;\\sum\_\{p\}c^\{B\}\_\{p\}B\_\{p,t\+1\}\+\\sum\_\{p\}c^\{I\}\_\{p\}I\_\{p,t\+1\}\+\\sum\_\{p\}c^\{q\}\_\{p\}q\_\{p,t\}\+c^\{Y\}\\sum\_\{w,k\}a^\{\\mathrm\{train\}\}\_\{w,k,t\},\(6\)and the control problem is to choose a causal policyut=π\(⋅\)u\_\{t\}=\\pi\(\\cdot\)minimizing the episode cost∑t=0T−1ct\\sum\_\{t=0\}^\{T\-1\}c\_\{t\}under the scenario’s disruption process \(demand spikes, absenteeism windows, and new\-product activations that require an initially uncertified skill\)\. The policy observes only the environment’s observation: in particular, the demand\-forecast window reflects each scenario’s visibility semantics — announced events appear in the window before they occur, while surprise activations are hidden until onset — so anticipation is possible exactly when, and only when, the scenario makes the future visible\. The intertemporal tension that makes this problem non\-trivial is that serving demand today and being*able*to serve demand tomorrow draw on the same worker hours through \([2](https://arxiv.org/html/2606.17269#S3.E2)\), while certification \([1](https://arxiv.org/html/2606.17269#S3.E1)\) makes future capacity a discrete, training\-lagged consequence of today’s allocation\. The receding\-horizon controller of Section[4](https://arxiv.org/html/2606.17269#S4)approximates this problem with a finite\-horizon mixed\-integer program whose prediction model convexifies the shipment dynamics into a net inventory–backlog balance and represents predicted certification with binary variables\.
## 4Skill\-Constrained Model Predictive Control
We study closed\-loop receding\-horizon control of a single\-site production\-inventory system in which worker skills are dynamic states and training is a control action that consumes the same worker hours as production\. The plant is the SkillChain\-Gym environment of the companion benchmark paper: continuous skill levelsSw,k,t∈\[0,1\]S\_\{w,k,t\}\\in\[0,1\]per workerwwand skillkk, hard threshold certificationQw,k,t=𝟏\[Sw,k,t≥θk\]Q\_\{w,k,t\}=\\mathbf\{1\}\[S\_\{w,k,t\}\\geq\\theta\_\{k\}\], production eligibility requiring certification, geometric forgetting at rateδk\\delta\_\{k\}, linear training gainαtrain\\alpha^\{\\mathrm\{train\}\}per hour, per\-product primary skillsk\(p\)k\(p\), productivityvpv\_\{p\}, an aggregate production\-capacity pool, and inventory/backlog dynamics with stage costs on backlog \(cpBc^\{B\}\_\{p\}\), holding \(cpIc^\{I\}\_\{p\}\), production \(cpqc^\{q\}\_\{p\}\), and training \(cYc^\{Y\}\)\. No soft productivity and no learning\-by\-doing are modeled\.
### 4\.1Closed\-loop algorithm
At every shiftttthe controller \(i\) observes the current state\(It,Bt,At,Ct,St,Qt\)\(I\_\{t\},B\_\{t\},A\_\{t\},C\_\{t\},S\_\{t\},Q\_\{t\}\)and the demand forecast window exposed by the environment; \(ii\) builds horizon forecastsD^p,h\|t\\hat\{D\}\_\{p,h\|t\},A^w,h\|t\\hat\{A\}\_\{w,h\|t\}, andC^h\|t\\hat\{C\}\_\{h\|t\}forh=0,…,H−1h=0,\\dots,H\-1; \(iii\) solves the finite\-horizon mixed\-integer program below; \(iv\) applies*only*the first\-period production and training hours\(xw,p,0,yw,k,0\)\(x\_\{w,p,0\},\\,y\_\{w,k,0\}\); the environment then realizes demand and disruptions and updates inventory, backlog, skills, and certifications; and \(v\) repeats from the updated state\. Every controller in this paper replans at every shift from the fresh observation; no plan is cached across steps\. Demand forecasts are read from the environment’s observation window, which reflects each scenario’s visibility semantics \(announced shocks appear in the window ahead of time; surprise activations are hidden until onset\), so the controller never receives information the observation does not contain\. Availability and capacity forecasts default to persistence of the current observation; alternative forecast\-quality and availability\-forecast modes are controller\-side transformations described in Section[5](https://arxiv.org/html/2606.17269#S5)\.
### 4\.2Finite\-horizon mixed\-integer program
For horizon stepsh=0,…,H−1h=0,\\dots,H\-1the decision variables are production hoursxw,p,h≥0x\_\{w,p,h\}\\geq 0, training hoursyw,k,h≥0y\_\{w,k,h\}\\geq 0, predicted inventoryIp,h\+1≥0I\_\{p,h\+1\}\\geq 0and backlogBp,h\+1≥0B\_\{p,h\+1\}\\geq 0, predicted skillsSw,k,h\+1∈\[0,1\]S\_\{w,k,h\+1\}\\in\[0,1\], and binary certificationscw,k,h∈\{0,1\}c\_\{w,k,h\}\\in\\\{0,1\\\}forh≥1h\\geq 1, plus terminal variables introduced below\. The constraints are:
∑pxw,p,h\+∑kyw,k,h\\displaystyle\\sum\_\{p\}x\_\{w,p,h\}\+\\sum\_\{k\}y\_\{w,k,h\}≤A^w,h\|t\\displaystyle\\;\\leq\\;\\hat\{A\}\_\{w,h\|t\}∀w,h\\displaystyle\\forall w,h\(7\)∑w,pxw,p,h\\displaystyle\\sum\_\{w,p\}x\_\{w,p,h\}≤C^h\|t\\displaystyle\\;\\leq\\;\\hat\{C\}\_\{h\|t\}∀h\\displaystyle\\forall h\(8\)∑wyw,k,h\\displaystyle\\sum\_\{w\}y\_\{w,k,h\}≤capktrain\\displaystyle\\;\\leq\\;cap^\{\\mathrm\{train\}\}\_\{k\}∀k,h\\displaystyle\\forall k,h\(9\)xw,p,0\\displaystyle x\_\{w,p,0\}≤A^w,0\|tQw,k\(p\),t\\displaystyle\\;\\leq\\;\\hat\{A\}\_\{w,0\|t\}\\,Q\_\{w,k\(p\),t\}∀w,p\\displaystyle\\forall w,p\(10\)xw,p,h\\displaystyle x\_\{w,p,h\}≤A^w,h\|tcw,k\(p\),h\\displaystyle\\;\\leq\\;\\hat\{A\}\_\{w,h\|t\}\\,c\_\{w,k\(p\),h\}∀w,p,h≥1\\displaystyle\\forall w,p,\\ h\\geq 1\(11\)θkcw,k,h\\displaystyle\\theta\_\{k\}\\,c\_\{w,k,h\}≤Sw,k,h\\displaystyle\\;\\leq\\;S\_\{w,k,h\}∀w,k,h≥1\\displaystyle\\forall w,k,\\ h\\geq 1\(12\)Ip,h\+1−Bp,h\+1\\displaystyle I\_\{p,h\+1\}\-B\_\{p,h\+1\}=Ip,h−Bp,h\+vp∑wxw,p,h−D^p,h\|t\\displaystyle\\;=\\;I\_\{p,h\}\-B\_\{p,h\}\+v\_\{p\}\\textstyle\\sum\_\{w\}x\_\{w,p,h\}\-\\hat\{D\}\_\{p,h\|t\}∀p,h\\displaystyle\\forall p,h\(13\)Sw,k,h\+1\\displaystyle S\_\{w,k,h\+1\}≤\(1−δk\)Sw,k,h\+αtrainyw,k,h\\displaystyle\\;\\leq\\;\(1\-\\delta\_\{k\}\)\\,S\_\{w,k,h\}\+\\alpha^\{\\mathrm\{train\}\}y\_\{w,k,h\}∀w,k,h\\displaystyle\\forall w,k,h\(14\)withSw,k,0S\_\{w,k,0\},Ip,0I\_\{p,0\},Bp,0B\_\{p,0\}fixed to observed values\. Constraint \([7](https://arxiv.org/html/2606.17269#S4.E7)\) is the capacity\-consuming training mechanism: production and training share each worker’s time budget\. First\-period eligibility \([10](https://arxiv.org/html/2606.17269#S4.E10)\) uses the*observed*certificationQw,k\(p\),tQ\_\{w,k\(p\),t\}, so the executed action is exactly feasible in the environment; predicted eligibility \([11](https://arxiv.org/html/2606.17269#S4.E11)\)–\([12](https://arxiv.org/html/2606.17269#S4.E12)\) uses binary certification coupled to predicted skill, so the optimizer can plan to train a worker and use the resulting capacity later in the horizon\. The skill dynamics enter as an inequality with the\[0,1\]\[0,1\]upper bound playing the role of the saturation projection; because predicted skill only relaxes constraints, the inequality binds wherever skill has value\. The net balance \([13](https://arxiv.org/html/2606.17269#S4.E13)\) permits simultaneous predicted inventory and backlog in principle, but with strictly positive holding costs the optimum always drivesmin\(I,B\)\\min\(I,B\)to zero; the implementation assertscpI\>0c^\{I\}\_\{p\}\>0\.
The stage objective sums backlog, holding, production, and training costs,
Jt=∑h=0H−1\[∑pcpBBp,h\+1\+∑pcpIIp,h\+1\+∑pcpqvp∑wxw,p,h\+cY\(1\+εh\)∑w,kyw,k,h\]\+Vf,J\_\{t\}\\;=\\;\\sum\_\{h=0\}^\{H\-1\}\\Bigl\[\\sum\_\{p\}c^\{B\}\_\{p\}B\_\{p,h\+1\}\+\\sum\_\{p\}c^\{I\}\_\{p\}I\_\{p,h\+1\}\+\\sum\_\{p\}c^\{q\}\_\{p\}\\,v\_\{p\}\\textstyle\\sum\_\{w\}x\_\{w,p,h\}\+c^\{Y\}\(1\+\\varepsilon h\)\\textstyle\\sum\_\{w,k\}y\_\{w,k,h\}\\Bigr\]\+V\_\{f\},\(15\)whereε=10−3\\varepsilon=10^\{\-3\}is a tie\-break that prefers earlier training when timing is otherwise cost\-degenerate; it is0\.1%0\.1\\%per step and does not overcome the forgetting\-driven preference for just\-in\-time training, nor does it change any cost comparison reported here\.
### 4\.3Terminal skill\-bottleneck value
The terminal valueVfV\_\{f\}prices skill bottlenecks left open at the end of the horizon\. With binary terminal certificationscw,kT∈\{0,1\}c^\{T\}\_\{w,k\}\\in\\\{0,1\\\}satisfyingθkcw,kT≤Sw,k,H\\theta\_\{k\}c^\{T\}\_\{w,k\}\\leq S\_\{w,k,H\}, define the certified capacity and the gap
capk=∑wA^w,H−1\|tcw,kT,gapk=max\(0,sd^k−capk\),Vf=λgap∑kgapk,\\mathrm\{cap\}\_\{k\}\\;=\\;\\sum\_\{w\}\\hat\{A\}\_\{w,H\-1\|t\}\\,c^\{T\}\_\{w,k\},\\qquad gap\_\{k\}\\;=\\;\\max\\\!\\bigl\(0,\\;\\widehat\{sd\}\_\{k\}\-\\mathrm\{cap\}\_\{k\}\\bigr\),\\qquad V\_\{f\}\\;=\\;\\lambda\_\{\\mathrm\{gap\}\}\\sum\_\{k\}gap\_\{k\},\(16\)where the forecasted skill demandsd^k\\widehat\{sd\}\_\{k\}is the per\-skill*maximum*of forecast demand hours from stepHHto the end of the visible forecast window\. This choice makes the terminal value a cost\-to\-go proxy: demand that is already visible beyond the horizon is exactly what the truncated objective would otherwise ignore\. Hidden \(surprise\) activations contribute nothing tosd^k\\widehat\{sd\}\_\{k\}until the environment reveals them, so the terminal value cannot leak unannounced information\. We evaluateλgap∈\{0,25,100\}\\lambda\_\{\\mathrm\{gap\}\}\\in\\\{0,25,100\\\}; the penalty is interpretable \(a price per uncovered certified hour at the horizon boundary\) and is ablated rather than tuned — the same three values are used in every scenario and training\-rate regime\.
### 4\.4Controller variants and baselines
ProductionOnlyMPC\.The receding\-horizon production/inventory program without training variables or skill dynamics; eligibility over the whole horizon uses the currently observed certifications\. It isolates the value of receding\-horizon inventory planning alone\.
MaintenanceMPC \(attribution ablation\)\.The full program with training restricted to skills the worker currently holds:yw,k,hy\_\{w,k,h\}is fixed to zero wheneverQw,k,t=0Q\_\{w,k,t\}=0\. The controller can maintain currently held certifications against forgetting but can never acquire a new skill, and — because the restriction re\-reads the observed certification at every replan — it can never recover a certification once it lapses\. The gap between ProductionOnlyMPC and MaintenanceMPC therefore measures certification\-maintenance value, and the gap between MaintenanceMPC and the full controller measures the value of acquiring or re\-acquiring certifications\.
SkillMPCNoTerminal\.The full program withλgap=0\\lambda\_\{\\mathrm\{gap\}\}=0\.
SkillMPCWithTerminal / Primary\.The full program with the terminal penalty\. The*primary configuration*isλgap=25\\lambda\_\{\\mathrm\{gap\}\}=25,H=10H=10, fixed ex ante before any validation run; all horizon andλgap\\lambda\_\{\\mathrm\{gap\}\}sweeps are sensitivity analyses, never best\-of selection\.
Static insurance baselines\.The open\-loop cross\-training plans of the benchmark paper: Static40 and Static60 at the default training rate, and rate\-recalibrated variants \(Static80 at the moderate rate; StaticSlow160, the same plan stretched to ten shifts, at the slow rate\) so that every static plan certifies all its trained cells at the training rate it faces\.
WaterFillingSkillGap\.The strongest reactive heuristic from the benchmark paper: skill\-gap\-driven training with proportional \(water\-filling\) production allocation\.
### 4\.5Solver implementation and diagnostics
Each replanning step solves the MILP withscipy\.optimize\.milp\(HiGHS branch and bound\) with binary certification variables\. We initially evaluated a pure\-LP relaxation with a continuous eligibility creditz≤S/θz\\leq S/\\thetaand rejected it: it grants uncertified workers large phantom capacity in prediction, which suppresses training incentives and makes the terminal gap vacuous, while a tighter ramp credit is non\-convex\. The MILP is the honest model and remains fast at benchmark scale \(typically33–120120ms per solve; the largest instances,H=15H\{=\}15or near\-infeasible demand, stay below0\.60\.6s\)\. Every solve records its status and wall\-clock time; if a solve ever failed, the step would fall back to the water\-filling heuristic and be counted — the fallback is a safety net, not part of the method, and it never triggered in any accepted run\. The executed first\-period action is exactly feasible by construction \(Eq\.[10](https://arxiv.org/html/2606.17269#S4.E10)\), which the environment confirms independently: its projection diagnostics \(projection frequency and norm, certification\-zeroed hours\) are identically zero in all reported experiments\.
## 5Experimental Setup
#### Simulator and integration\.
All experiments run on the accepted SkillChain\-Gym simulator from the companion benchmark paper, imported read\-only; no Paper 1 code was modified\. The default instance has two products, three skills, four workers, one aggregate capacity pool, thresholdθk=0\.6\\theta\_\{k\}=0\.6, training gainαtrain=0\.05\\alpha^\{\\mathrm\{train\}\}=0\.05per hour, forgettingδk=0\.005\\delta\_\{k\}=0\.005per shift, and horizonT=60T=60shifts\. The only configuration change relative to the benchmark tables is a wider observation forecast window \(16 shifts≥Hmax\+1\\geq H\_\{\\max\}\+1\) so receding\-horizon controllers can see announced shocks;*every*policy — MPC and baseline alike — receives the same widened observation, so observation parity holds within every comparison \(baseline numbers therefore differ from the benchmark paper’s window\-3 tables\)\. MPC plans both production and training with the window, while the reused heuristics consume the lookahead only in their training rule; their production rule is single\-step by design\.
#### Protocol lock\.
The primary configuration is SkillMPC withλgap=25\\lambda\_\{\\mathrm\{gap\}\}=25andH=10H=10, chosen ex ante before the validation suite ran\. Horizon andλgap\\lambda\_\{\\mathrm\{gap\}\}sweeps are sensitivity analyses, not model selection; no headline claim is based on a best\-of\-sweep configuration\. In all artifacts,solve\_status=’heuristic’marks Paper 1 baseline rows \(no solver involved\); it is*not*an MPC fallback\. MPC fallbacks are counted separately and are zero throughout\.
#### Scenario families\.
The core suite uses five scenarios at the default training rate:no\_shock\_sanity;new\_product\_announced\_t60\(a product activating mid\-episode whose primary skill no worker initially holds, visible in the forecast window beforehand\);new\_product\_surprise\_t60\(activation shift randomized per seed and hidden from the forecast until onset\);demand\_shock\_mid\(temporary demand spike\); andabsenteeism\_mid\(the two holders of one skill absent for eight shifts\)\. Around this core, the suite adds: demand\-forecast\-quality modes on the announced shock \(noisy, multiplicativeσ=0\.15\\sigma=0\.15; delayed by three shifts, emulated by truncating the effective window; biased, visible forecast scaled by0\.80\.8\); availability\-forecast modes on absenteeism \(naive persistence; announced absence, where the controller receives the exact absence window; noisy announced, expecting half the lost hours\); capacity\-slack boundary sweeps on the surprise shock \(capacity30,31,4030,31,40against post\-activation demand of3030labor\-hours, plus a scaled\-demand check at16\+816\{\+\}8demand with capacity24,2624,26\); training\-rate sensitivity on the announced shock \(defaultα=0\.05\\alpha=0\.05, moderate0\.0250\.025, slow0\.01250\.0125, with static baselines recalibrated per rate\); and two negative controls \(a no\-bottleneck control withδ=0\\delta=0, and a near\-infeasible control with post\-activation demand at the labor envelope\)\. A final evidence pass adds three cells with certification\-event metrics:recert\_demand\_shock,announced\_acquisition, andgreenfield\_visible\(demand for a skill held by no worker att=0t\{=\}0, forecast\-visible from the first shift\)\.
#### Baselines and variants\.
ProductionOnlyMPC, MaintenanceMPC, and SkillMPC withλgap∈\{0,25,100\}\\lambda\_\{\\mathrm\{gap\}\}\\in\\\{0,25,100\\\}andH∈\{3,5,10,15\}H\\in\\\{3,5,10,15\\\}; the static insurance plans \(Static40 and Static60 at the default rate, Static80 at the moderate rate, StaticSlow160 at the slow rate; the uncalibrated Static40 is additionally kept in the slow cell, clearly labeled, to document calibration fragility\); and the water\-filling skill\-gap heuristic \(Section[4\.4](https://arxiv.org/html/2606.17269#S4.SS4)\)\.
#### Metrics\.
Per episode: total cost; service level and per\-product service; total and peak backlog; recovery rate, recovery time conditional on recovery, and unrecovered counts; training hours \(total and per skill\); terminal new certifications; within\-episode*recertifications*\(upwardθ\\theta\-crossings of pairs certified att=0t\{=\}0or previously acquired\) and*greenfield acquisitions*\(first crossing of pairs not certified att=0t\{=\}0\) — counters added because the terminal metric is blind to lapse\-and\-recover events; skill\-bottleneck severity; and solver diagnostics \(status, mean solve time, fallback count, realized terminal gap\) together with the environment’s projection diagnostics\.
#### Statistical protocol\.
Twenty seeds per validation and final\-evidence cell, paired by seed\. Policy comparisons report the per\-seed win rate with an exact two\-sided paired sign test, and separately a seeded paired bootstrap \(10,000 resamples\) percentile confidence interval on the mean cost difference with relative effect sizes\. The two criteria are reported side by side and are not interchangeable: a comparison can show a favorable mean effect whose bootstrap CI excludes zero \(“ci\-sig”\) while its sign test is not significant; we describe such cases as mean\-effect evidence, not decisive win\-rate evidence\. Episodes replay deterministically under fixed seeds: two independent executions of each suite produce byte\-identical CSVs once the wall\-clock solve\-time column is excluded\.
#### Artifacts\.
The validation suite comprises 87 aggregate cells and 1,740 seed\-level rows; the final evidence pass adds 18 cells and 360 rows\. All artifacts, per\-seed sweeps, figures, and reproduction commands accompany the paper\.
## 6Results
The results are organized by mechanism and regime rather than as a single ranking\. The summary finding is regime dependence: the predictive controller is favored in the tested regimes when bottlenecks or labor shocks are forecastable, while lean static insurance remains strong under surprise shocks, near the demand–capacity boundary, and wherever insurance is cheap\. We report adverse regimes before favorable ones\. Throughout, “Primary” is the ex\-ante locked configuration \(SkillMPC,λgap=25\\lambda\_\{\\mathrm\{gap\}\}=25,H=10H=10\); win rates come with exact paired sign tests, and “ci\-sig” marks paired\-bootstrap confidence intervals on the mean cost difference that exclude zero — the two criteria are reported separately and can disagree\.
### 6\.1Implementation and reproducibility
The validation suite executes 67,200 MILP solves in closed loop with*zero*MPC fallbacks, zero NaNs, zero negative state metrics, and environment projection diagnostics identically zero \(every executed action exactly feasible\)\. Rows markedsolve\_status=’heuristic’are Paper 1 baselines, not MPC failures\. Median solve times range from∼4\{\\sim\}4ms \(H=3H\{=\}3\) to∼94\{\\sim\}94ms \(H=15H\{=\}15\), with a worst cell mean of0\.570\.57s in the near\-infeasible control\. Two independent executions of each suite produce byte\-identical CSVs once the wall\-clocksolve\_ms\_meancolumn is excluded\.
### 6\.2Attribution: what the skill machinery is worth
Table[1](https://arxiv.org/html/2606.17269#S6.T1)decomposes the controller’s value through the ablation chain ProductionOnly→\\toMaintenance→\\toPrimary, which holds the inventory\-anticipation capability constant \(all three share the same inventory/backlog program and forecast\) and varies only training eligibility\.
Table 1:Attribution chain \(20 seeds; mean cost; certification events per episode from the final\-evidence counters\)\. On the new\-product cells, MaintenanceMPC equals ProductionOnlyMPC bit\-for\-bit on every seed\.Ondemand\_shock\_midthe chain is21,235→7,698→3,70921\{,\}235\\to 7\{,\}698\\to 3\{,\}709\. The first leg \(77%77\\%of the gap\) is certification*maintenance*: without training, the production\-only controller loses both initially held certifications of the demanded skill to forgetting mid\-episode\. The second leg \(23%23\\%\) is*not*new\-skill acquisition — both policies train only the already\-held skill and terminal new\-certification counts are zero for both — but*re\-acquisition of lapsed certifications*: the within\-episode counters show Primary records1\.01\.0recertifications per episode on the demanded skill while MaintenanceMPC records0\.00\.0, because once a certification lapses the maintenance ablation can never recover it by construction\. That Primary’s recertification count is1\.01\.0while its*terminal*new\-certification count is0demonstrates directly that the terminal metric is blind to lapse\-and\-recover events; the within\-episode counters were added for this reason\. New\-skill acquisition is exercised separately: on the new\-product cells \(announced, surprise, and greenfield\) MaintenanceMPC equals ProductionOnlyMPC bit\-for\-bit on every seed — neither can touch the initially unheld skill — so the entire gap to Primary there is acquisition value\. Onno\_shock\_sanitythe maintenance leg is the whole story \(Primary1,8321\{,\}832vs\. Maintenance1,8331\{,\}833, 18/20 ties, vs\. ProductionOnly16,73516\{,\}735\), and in theδ=0\\delta=0negative control the chain degenerates: Primary equals ProductionOnlyMPC exactly, with zero training hours — the skill machinery contributes nothing when no skill bottleneck exists\.
### 6\.3Where static insurance remains strong
Table 2:Core T=60=60results \(20 seeds\): mean cost±\\pms\.d\. Sign\-test notes in text\. PO = ProductionOnlyMPC, Maint = MaintenanceMPC, WF = WaterFillingSkillGap\.#### Surprise shocks\.
Lean static insurance wins the surprise new\-product regime on cost: Static40 at1,9811\{,\}981beats Primary at2,1102\{,\}110on 15 of 20 paired seeds \(signp=0\.041p=0\.041, ci\-sig,\+6\.5%\+6\.5\\%relative\), and Static60 likewise\. The controller cannot anticipate a hidden activation any better than the heuristics can; its advantage over the reactive water\-filling heuristic \(2,6582\{,\}658, Primary wins 20–0\) comes from cleaner reaction, not foresight\.
#### Absenteeism without availability forecasts\.
Under the default persistence assumption, Static60 at4,9154\{,\}915beats Primary at6,6756\{,\}675on 20 of 20 seeds \(\+36%\+36\\%\): the static plan’s cross\-trained backups are pre\-bought redundancy, while the controller only reacts once workers are already absent\. Primary ties Static40 \(10–10\)\.
#### The demand–capacity boundary\.
Table[3](https://arxiv.org/html/2606.17269#S6.T3)sweeps post\-activation capacity slack on the surprise shock\. At zero slack \(capacity3030against3030demanded hours\) any reaction transient is structurally hard to recover: Primary recovers in only45%45\\%of episodes \(11/20 unrecovered\) and loses to Static40 20–0 \(3,3053\{,\}305vs\.2,0892\{,\}089,\+58%\+58\\%\); the reactive heuristic recovers in5%5\\%\. One unit of slack restores Primary’s recovery to95%95\\%but Static40 still wins 17–3; even at generous slack \(capacity 40\) Static40 retains a small significant edge \(16–4,\+7%\+7\\%\)\. The scaled\-demand check \(16\+816\{\+\}8demand\) reproduces the qualitative pattern at a different absolute scale, with static favored throughout because cheap pre\-shock labor makes its insurance nearly free\.
Table 3:Capacity\-slack boundary on the surprise shock \(20 seeds\): mean cost, recovery rate, and unrecovered episodes\. Post\-activation demand is 30 labor\-hours \(default scale\) and 24 \(scaled\)\.Capacity \(slack\)Policycostrecoveryunrecovered30 \(0\)Primary3,3050\.4511Static402,0890\.951WF5,4260\.051931 \(1\)Primary2,2650\.951Static401,9811\.00040 \(8\+, labor\-bound\)Primary2,1211\.000Static401,9811\.00024 \(0, scaled\)Primary2,3710\.902Static401,6910\.95126 \(2, scaled\)Primary1,7031\.000Static401,6211\.000
#### Near\-infeasible demand\.
When post\-activation demand sits at the labor envelope, no policy can fully serve, and the cheaper insurance wins: Static402,4842\{,\}484beats Primary3,4113\{,\}411on 20 of 20 seeds\. Predictive control does not rescue a structurally overloaded system\.
### 6\.4Where forecastability favors the controller
#### Announced bottlenecks\.
On the announced new\-product shock, Primary at1,9921\{,\}992shows a favorable mean effect against Static40 at2,1082\{,\}108\(ci\-sig,−5\.5%\-5\.5\\%relative\), but wins only 13 of 20 paired seeds \(signp=0\.26p=0\.26\): we report this as mean\-effect evidence, not decisive win\-rate evidence\. Against Static60 the advantage is sign\-significant \(17–3,p=0\.0026p=0\.0026\) and against the water\-filling heuristic decisive \(20–0,−12%\-12\\%\)\. The controller achieves this with13\.413\.4training hours versus the static plans’4040–6060: it buys only the certifications the forecast justifies, just in time\.
#### Demand shocks\.
Primary at3,7093\{,\}709beats every baseline on 20 of 20 seeds \(Static408,8028\{,\}802,−58%\-58\\%; Static606,9916\{,\}991,−47%\-47\\%; water\-filling7,3487\{,\}348,−50%\-50\\%\): receding\-horizon inventory pre\-building ahead of the announced spike, combined with certification maintenance, is a capability none of the open\-loop or single\-step baselines has\. The same holds inno\_shock\_sanityagainst the static plans \(20–0\), where Primary additionally ties the water\-filling heuristic \(9–11, n\.s\.\)\.
#### Forecastable labor shocks\.
The sharpest reversal is the availability experiment\. With the default persistence assumption the controller loses absenteeism to Static60 \(above\); given an*announced*absence window, Primary drops from6,6756\{,\}675to2,7802\{,\}780— beating its own persistence variant 20–0 \(−58%\-58\\%\) and Static60 \(4,9154\{,\}915\) 20–0 \(−43%\-43\\%\)\. A noisy announcement \(half the lost hours expected\) recovers part of the value \(5,6685\{,\}668\)\. Forecastable labor shocks move the absenteeism regime from static insurance to predictive control; surprise labor shocks do not\.
### 6\.5Terminal skill\-bottleneck value: a dose–response in training lag
The terminal penalty’s effect depends on the relation between training lag and horizon, and the three training\-rate regimes must not be pooled\. At the*default*rate \(certification within one shift\), just\-in\-time training is optimal in the predictor and the penalty has no cost effect atH=10H=10:λgap=25\\lambda\_\{\\mathrm\{gap\}\}=25vs\.0is 8–8 with 4 ties on the announced shock, although the realized terminal\-gap diagnostic drops from28\.528\.5to0\.00\.0— the penalty changes the predicted plan, not the executed cost\. At the*moderate*rate \(α=0\.025\\alpha=0\.025, lag≈2\{\\approx\}2shifts\),λgap=25\\lambda\_\{\\mathrm\{gap\}\}=25atH=3H=3costs2,0272\{,\}027versus3,2093\{,\}209without the penalty\. At the*slow*rate \(α=0\.0125\\alpha=0\.0125, lag 2–4 shifts\), the penalty becomes decisive for short horizons: without it theH=3H=3controller never trains and matches production\-only behavior \(67,86167\{,\}861, new\-product service0, while its own gap diagnostic sits at28\.528\.5\); with it, training starts at shift 9 — the first shift the activation enters the forecast window — and cost falls to2,0352\{,\}035\(−97%\-97\\%, 20–0; Figure[1](https://arxiv.org/html/2606.17269#S6.F1)\)\. AtH=5H=5the no\-terminal controller trains late and partially recovers \(11,23111\{,\}231\); the with\-terminal variants are indistinguishable from theH=10H=10primary\. Against fairly recalibrated static plans the controller retains a decisive advantage on this announced shock: Primary beats StaticSlow160 \(2,0372\{,\}037vs\.4,7224\{,\}722, 20–0\) and Static80 at the moderate rate \(2,0032\{,\}003vs\.2,7562\{,\}756, 20–0\), while the*uncalibrated*Static40 fails completely at the slow rate \(67,94567\{,\}945, labeled calibration fragility\)\.
Figure 1:Announced new product under slow training \(α=0\.0125\\alpha=0\.0125, seed 0,H=3H=3\): cumulative rare\-skill training \(top\) and new\-product backlog \(bottom\)\. Without the terminal penalty the controller never trains and backlog grows without bound; withλgap=25\\lambda\_\{\\mathrm\{gap\}\}=25training starts the first shift the activation becomes visible \(shift 9\) and backlog stays near zero\.
### 6\.6Forecast quality and horizon sensitivity
On the announced shock, degrading the forecast degrades the controller in an interpretable order: perfect1,9921\{,\}992; delayed\-by\-31,9921\{,\}992\(exactly zero effect on all 20 seeds\); noisy3,0023\{,\}002\(\+51%\+51\\%, 0–20 against perfect\); biased \(0\.8×0\.8\\times\)5,1895\{,\}189\(\+160%\+160\\%, 0–20\)\. The delay result is a property, not an anomaly: a delay harms the controller only once remaining visibility falls below the training lag plus one shift, and at the default rate \(lag one shift\) visibility of even a single shift suffices; under slow training, larger delays bind correspondingly earlier\. Horizon sensitivity at the lockedλgap=25\\lambda\_\{\\mathrm\{gap\}\}=25is mild in the default\-rate regime \(H=3/5/10/15H=3/5/10/15:2,017/2,014/1,992/1,9952\{,\}017/2\{,\}014/1\{,\}992/1\{,\}995announced;2,133/2,108/2,110/2,0982\{,\}133/2\{,\}108/2\{,\}110/2\{,\}098surprise\), consistent with the terminal value substituting for horizon length only when training lag binds\.
### 6\.7Greenfield acquisition
Ingreenfield\_visible\(no worker holds the required skill att=0t\{=\}0; demand visible from the first shift\), Primary acquires2\.02\.0greenfield certifications per episode with1414training hours and reaches cost2,2252\{,\}225\. The comparison with Static40 \(2,3432\{,\}343\) is favorable on the mean with a bootstrap CI excluding zero \(−118\-118, CI\[−203,−38\]\[\-203,\-38\]\) but*not*sign\-test significant \(13–7,p=0\.26p=0\.26\); we phrase it conservatively as mean\-effect evidence\. Against Static60 the advantage is sign\-significant \(15–5,p=0\.041p=0\.041\) and against water\-filling clear \(18–2,p=4×10−4p=4\\times 10^\{\-4\}\)\. MaintenanceMPC and ProductionOnlyMPC, which cannot acquire the skill, match one another exactly \(138,854138\{,\}854; new\-product service0; 20–0\)\. Together with Sections[6\.2](https://arxiv.org/html/2606.17269#S6.SS2)–[6\.5](https://arxiv.org/html/2606.17269#S6.SS5), this completes the mechanism picture: maintenance, re\-acquisition, and greenfield acquisition are separated, each with direct metric support\.
## 7Discussion and Limitations
#### Reading the regime map conservatively\.
The headline of Section[6](https://arxiv.org/html/2606.17269#S6)is not that predictive control wins, but*where*it wins, and the adverse regimes come first\. Lean static cross\-training remains strong — often sign\-significantly better than the locked primary controller — under surprise shocks \(Static40 wins 15–5\), under absenteeism when the controller can only assume availability persistence \(Static60 wins 20–0\), throughout the demand–capacity boundary sweep \(including 20–0 at zero slack, where reaction transients are structurally hard to recover\), and in the near\-infeasible control where no policy can fully serve and cheaper insurance wins 20–0\. Theδ=0\\delta=0negative control adds the cleanest null: with no forgetting and no skill bottleneck, the skill\-aware controller reproduces the production\-only controller exactly, with zero training hours\. Any claim about skill\-constrained MPC must survive these cells, and ours is correspondingly narrow: predictive control is useful in the tested regimes when skill or labor bottlenecks are forecastable early enough for training to complete\.
#### Favorable regimes\.
Where the future is visible, the controller converts visibility into lean, just\-in\-time capability decisions\. On forecast\-visible bottlenecks it buys only the certifications the forecast justifies \(1313–1414hours of training against the static plans’4040–6060\), with a favorable mean effect against the leanest static plan \(ci\-sig but not sign\-significant, 13–7\) and sign\-significant advantages beyond it\. On demand shocks it combines inventory pre\-building with certification maintenance, a joint capability no open\-loop or single\-step baseline has \(20–0 against all\)\. The sharpest reversal is informational: given an announced absence window, the same controller that loses absenteeism under persistence beats Static60 20–0 at−43%\-43\\%cost\. Forecastable labor shocks move a regime that Paper 1 assigned to static insurance into the predictive\-control column; hidden labor shocks do not\.
#### What the attribution chain establishes\.
The ProductionOnly→\\toMaintenance→\\toPrimary chain, with inventory\-anticipation held constant, separates three mechanisms with direct metric support\. Certification*maintenance*against forgetting is the largest leg wherever skills are initially held \(77% of the demand\-shock gap; the entire no\-shock gap\)\.*Re\-acquisition of lapsed certifications*is visible only through the within\-episode recertification counters \(Primary1\.01\.0vs\. Maintenance0\.00\.0per episode on the demanded skill\) and is provably invisible to terminal certification counts\.*Greenfield acquisition*is exercised where no worker holds the required skill: the maintenance ablation collapses onto production\-only bit\-for\-bit, and the full controller acquires exactly the certifications the forecast justifies\. Per\-skill training hours confirm that each mechanism operates on the expected skill and no other\.
#### Forecast privilege, not magic adaptivity\.
The controller’s advantages are announcement and forecast leverage, made explicit rather than hidden\. Degrading the forecast degrades the controller in an interpretable order \(noisy\+51%\+51\\%, biased\+160%\+160\\%\), a delayed forecast is harmless exactly until remaining visibility falls below the training lag, and hiding the shock entirely \(surprise\) hands the regime back to static insurance\. We consider this transparency a feature of the evaluation design: every policy receives the same observation, the surprise scenarios show what happens when privilege is absent, and the availability\-forecast experiment quantifies precisely how much of the absenteeism regime is purchasable with better labor forecasts\.
#### Terminal skill\-bottleneck value\.
The terminal penalty matters exactly where theory predicts: when the training lag approaches or exceeds the horizon\. At the default training rate \(one\-shift lag\) just\-in\-time training is optimal in the predictor andλgap\\lambda\_\{\\mathrm\{gap\}\}has no cost effect atH=10H\{=\}10— the penalty closes the predicted terminal gap without changing executed cost\. At the moderate and slow rates the no\-terminal short\-horizon controller trains too late or not at all \(up to−97%\-97\\%cost difference at the slow rate, where the no\-terminal controller sits on a visible gap diagnostic of28\.528\.5and never pays for it\)\. The three regimes are reported separately and should never be pooled; the terminal value is an interpretable cost\-to\-go proxy, ablated at the same threeλgap\\lambda\_\{\\mathrm\{gap\}\}values everywhere rather than tuned per scenario\.
#### Limitations\.
The simulator is stylized by design: linear training gain, geometric forgetting, hard threshold certification, and a small synthetic instance \(two products, three skills, four workers, one site\)\. Procurement and material flows, supplier delays, multi\-echelon networks, job\-shop routing, soft productivity, and learning\-by\-doing are excluded; no real data are used, and no demographic or broader fairness claims are made\. Forecast modes are simple transformations \(noise, delay, bias, persistence\) rather than learned forecasters, and availability announcements are exact windows\. Solver scalability is demonstrated only at this scale \(milliseconds per solve; sub\-second in the hardest cell\) — larger instances with more workers, skills, or binary structure remain untested\. The static insurance baselines are favorable by construction \(their plan structure encodes which skills can become critical, and they are recalibrated per training rate\), which strengthens the adverse cells but also means the favorable comparisons are against strong, informed opponents rather than strawmen\. Finally, the statistical protocol is paired over twenty seeds; borderline comparisons \(announced and greenfield against Static40\) are reported as mean\-effect evidence with confidence intervals, not as decisive win\-rate results\.
## 8Conclusion
We formulated and evaluated a closed\-loop skill\-constrained model predictive controller for a single\-site production\-inventory system in which worker skills are dynamic states, certification is a hard threshold, and training is a control action that competes with production for the same worker hours\. At every shift the controller solves a finite\-horizon mixed\-integer program with binary predicted certification and an interpretable terminal skill\-bottleneck penalty, applies only the first\-period action, and replans; all 67,200 solves in the validation suite completed without fallback, the executed actions were exactly feasible throughout, and every artifact replays deterministically under fixed seeds\.
The central empirical result is regime dependence, established against strong, deliberately favorable static\-insurance baselines and a locked\-in\-advance primary configuration\. Predictive control is useful when skill or labor bottlenecks are forecastable early enough for training to complete: forecast\-visible new\-product bottlenecks, announced demand shocks where inventory anticipation combines with certification maintenance, announced absence windows, and slow\-training regimes where the terminal skill\-bottleneck value prevents short\-horizon controllers from ignoring visible future gaps\. Static insurance remains hard to beat when shocks are hidden, when reaction transients are structurally unrecoverable near the demand–capacity boundary, and wherever pre\-shock slack makes insurance cheap\. No policy class dominates across regimes\. The attribution chain separates the controller’s value into certification maintenance, re\-acquisition of lapsed certifications, and greenfield skill acquisition, each supported by within\-episode certification\-event counters and per\-skill training evidence rather than terminal summaries alone\.
Natural next steps follow the same discipline\. Richer skill structures and larger instances would test whether the regime boundaries persist when no static plan can pre\-train every contingency and when solver scale begins to bind; stochastic or scenario\-based receding\-horizon formulations could price hidden shocks that the present certainty\-equivalent controller cannot anticipate; learned or public\-data\-calibrated demand and absence forecasts would replace the stylized forecast modes; and integration with the released SkillChain\-Gym benchmark of the companion paper would let other controllers and learned policies be evaluated under the same scenarios, metrics, and statistical protocol\.
## Funding
This research received no external funding\.
## References
- \[1\]\(2021\)Multi\-skilling in scheduling problems: a review on models, methods and applications\.Computers & Industrial Engineering151,pp\. 107004\.External Links:[Document](https://dx.doi.org/10.1016/j.cie.2020.107004),[Link](https://doi.org/10.1016/j.cie.2020.107004)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[2\]D\. A\. Alvarez\-Rodriguez, J\. E\. Normey\-Rico, and R\. C\. C\. Flesch\(2017\)Model predictive control for inventory management in biomass manufacturing supply chains\.International Journal of Production Research55\(12\),pp\. 3596–3608\.External Links:[Document](https://dx.doi.org/10.1080/00207543.2017.1315191),[Link](https://doi.org/10.1080/00207543.2017.1315191)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[3\]N\. Azizi and M\. Liang\(2013\)An integrated approach to worker assignment, workforce flexibility acquisition, and task rotation\.Journal of the Operational Research Society64\(2\),pp\. 260–275\.External Links:[Document](https://dx.doi.org/10.1057/jors.2012.30),[Link](https://doi.org/10.1057/jors.2012.30)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[4\]N\. Azizi, S\. Zolfaghari, and M\. Liang\(2010\)Modeling job rotation in manufacturing systems: the study of employee’s boredom and skill variations\.International Journal of Production Economics123\(1\),pp\. 69–85\.External Links:[Document](https://dx.doi.org/10.1016/j.ijpe.2009.07.010),[Link](https://doi.org/10.1016/j.ijpe.2009.07.010)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[5\]D\. Biskup\(2008\)A state\-of\-the\-art review on scheduling with learning effects\.European Journal of Operational Research188\(2\),pp\. 315–329\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2007.05.040),[Link](https://doi.org/10.1016/j.ejor.2007.05.040)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[6\]M\. W\. Braun, D\. E\. Rivera, M\. E\. Flores, W\. M\. Carlyle, and K\. G\. Kempf\(2003\)A model predictive control framework for robust management of multi\-product, multi\-echelon demand networks\.Annual Reviews in Control27\(2\),pp\. 229–245\.External Links:[Document](https://dx.doi.org/10.1016/j.arcontrol.2003.09.006),[Link](https://doi.org/10.1016/j.arcontrol.2003.09.006)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[7\]M\. W\. Braun, D\. E\. Rivera, W\. M\. Carlyle, and K\. G\. Kempf\(2003\)Application of model predictive control to robust management of multiechelon demand networks in semiconductor manufacturing\.SIMULATION79\(3\),pp\. 139–156\.External Links:[Document](https://dx.doi.org/10.1177/0037549703255637),[Link](https://doi.org/10.1177/0037549703255637)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[8\]F\. Catalano, I\. Zennaro, and A\. Persona\(2025\)Manned assembly systems design considering layout, workforce strategy, and workers’ heterogeneity: state of the art and future research agenda\.The International Journal of Advanced Manufacturing Technology139\(9–10\),pp\. 4315–4337\.External Links:[Document](https://dx.doi.org/10.1007/s00170-025-16189-0),[Link](https://doi.org/10.1007/s00170-025-16189-0)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[9\]R\. Cavagnini, M\. Hewitt, and F\. Maggioni\(2020\)Workforce production planning under uncertain learning rates\.International Journal of Production Economics225,pp\. 107590\.External Links:[Document](https://dx.doi.org/10.1016/j.ijpe.2019.107590),[Link](https://doi.org/10.1016/j.ijpe.2019.107590)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[10\]X\. Chu, D\. Gao, S\. Cheng, L\. Wu, J\. Chen, Y\. Shi, and Q\. Qin\(2019\)Worker assignment with learning\-forgetting effect in cellular manufacturing system using adaptive memetic differential search algorithm\.Computers & Industrial Engineering136,pp\. 381–396\.External Links:[Document](https://dx.doi.org/10.1016/j.cie.2019.07.028),[Link](https://doi.org/10.1016/j.cie.2019.07.028)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[11\]P\. De Bruecker, J\. Belien, J\. Van den Bergh, and E\. Demeulemeester\(2018\)A three\-stage mixed integer programming approach for optimizing the skill mix and training schedules for aircraft maintenance\.European Journal of Operational Research267\(2\),pp\. 439–452\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2017.11.047),[Link](https://doi.org/10.1016/j.ejor.2017.11.047)Cited by:[§2\.3](https://arxiv.org/html/2606.17269#S2.SS3.p1.1)\.
- \[12\]P\. De Bruecker, J\. Van den Bergh, J\. Belien, and E\. Demeulemeester\(2015\)Workforce planning incorporating skills: state of the art\.European Journal of Operational Research243\(1\),pp\. 1–16\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2014.10.038),[Link](https://doi.org/10.1016/j.ejor.2014.10.038)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p1.1),[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[13\]P\. Doganis, E\. Aggelogiannaki, and H\. Sarimveis\(2008\)A combined model predictive control and time series forecasting framework for production\-inventory systems\.International Journal of Production Research46\(24\),pp\. 6841–6853\.External Links:[Document](https://dx.doi.org/10.1080/00207540701523058),[Link](https://doi.org/10.1080/00207540701523058)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[14\]D\. Fu, E\. Aghezzaf, and R\. De Keyser\(2014\)A model predictive control framework for centralised management of a supply chain dynamical system\.Systems Science & Control Engineering2\(1\),pp\. 250–260\.External Links:[Document](https://dx.doi.org/10.1080/21642583.2014.895449),[Link](https://doi.org/10.1080/21642583.2014.895449)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[15\]D\. Fu, C\. M\. Ionescu, E\. Aghezzaf, and R\. De Keyser\(2015\)A constrained epsac approach to inventory control for a benchmark supply chain system\.International Journal of Production Research54\(1\),pp\. 232–250\.External Links:[Document](https://dx.doi.org/10.1080/00207543.2015.1070214),[Link](https://doi.org/10.1080/00207543.2015.1070214)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[16\]E\. H\. Grosse, C\. H\. Glock, and S\. Muller\(2015\)Production economics and the learning curve: a meta\-analysis\.International Journal of Production Economics170,pp\. 401–412\.External Links:[Document](https://dx.doi.org/10.1016/j.ijpe.2015.06.021),[Link](https://doi.org/10.1016/j.ijpe.2015.06.021)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[17\]S\. Hall, L\. Guerrini, F\. Dorfler, and D\. Liao\-McPherson\(2024\)Receding horizon games for modeling competitive supply chains\.External Links:2401\.09853,[Document](https://dx.doi.org/10.48550/arXiv.2401.09853),[Link](https://arxiv.org/abs/2401.09853)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[18\]S\. E\. Hashemi\-Petroodi, A\. Dolgui, S\. Kovalev, M\. Y\. Kovalyov, and S\. Thevenin\(2021\)Workforce reconfiguration strategies in manufacturing systems: a state of the art\.International Journal of Production Research59\(22\),pp\. 6721–6744\.External Links:[Document](https://dx.doi.org/10.1080/00207543.2020.1823028),[Link](https://doi.org/10.1080/00207543.2020.1823028)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p1.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[19\]C\. A\. Henao, Y\. A\. Mercado, V\. I\. Gonzalez, and A\. Luer\-Villagra\(2023\)Multiskilled personnel assignment with k\-chaining considering the learning\-forgetting phenomena\.International Journal of Production Economics265,pp\. 109018\.External Links:[Document](https://dx.doi.org/10.1016/j.ijpe.2023.109018),[Link](https://doi.org/10.1016/j.ijpe.2023.109018)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[20\]C\. A\. Henao, A\. F\. Porto, and V\. I\. Gonzalez\(2024\)A benchmark dataset for the retail multiskilled personnel planning under uncertain demand\.Data Science7\(1\),pp\. 13–27\.External Links:[Document](https://dx.doi.org/10.3233/DS-240060),[Link](https://doi.org/10.3233/DS-240060)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[21\]P\. Heuser, P\. Letmathe, and M\. Schinner\(2022\)Workforce planning in production with flexible or budgeted employee training and volatile demand\.Journal of Business Economics92\(7\),pp\. 1093–1124\.External Links:[Document](https://dx.doi.org/10.1007/s11573-022-01090-z),[Link](https://doi.org/10.1007/s11573-022-01090-z)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[22\]M\. Hewitt, A\. Chacosky, S\. E\. Grasman, and B\. W\. Thomas\(2015\)Integer programming techniques for solving non\-linear workforce planning models with learning\.European Journal of Operational Research242\(3\),pp\. 942–950\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2014.10.060),[Link](https://doi.org/10.1016/j.ejor.2014.10.060)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[23\]T\. Hipolito, J\. L\. Nabais, M\. A\. Botto, and R\. R\. Negenborn\(2020\)Effective continuous\-flow supply chains using centralized model predictive control\.IFAC\-PapersOnLine53\(2\),pp\. 10855–10860\.External Links:[Document](https://dx.doi.org/10.1016/j.ifacol.2020.12.2808),[Link](https://doi.org/10.1016/j.ifacol.2020.12.2808)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[24\]H\. Jang, T\. Y\. Jung, K\. Yeh, and J\. H\. Lee\(2013\)A model predictive control approach for fab\-wide scheduling in semiconductor manufacturing facilities\.IFAC Proceedings Volumes46\(24\),pp\. 493–498\.External Links:[Document](https://dx.doi.org/10.3182/20130911-3-BR-3021.00061),[Link](https://doi.org/10.3182/20130911-3-BR-3021.00061)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p2.1)\.
- \[25\]H\. Jin, B\. W\. Thomas, and M\. Hewitt\(2016\)Integer programming techniques for makespan minimizing workforce assignment models that recognize human learning\.Computers & Industrial Engineering97,pp\. 202–211\.External Links:[Document](https://dx.doi.org/10.1016/j.cie.2016.03.027),[Link](https://doi.org/10.1016/j.cie.2016.03.027)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[26\]S\. T\. Kafiabad, M\. Kazemi Zanjani, and M\. Nourelfath\(2020\)Integrated planning of operations and on\-job training in maintenance logistics networks\.Reliability Engineering & System Safety199,pp\. 106922\.External Links:[Document](https://dx.doi.org/10.1016/j.ress.2020.106922),[Link](https://doi.org/10.1016/j.ress.2020.106922)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p2.1),[§2\.3](https://arxiv.org/html/2606.17269#S2.SS3.p1.1)\.
- \[27\]H\. V\. Kher, M\. K\. Malhotra, P\. R\. Philipoom, and T\. D\. Fry\(1999\)Modeling simultaneous worker learning and forgetting in dual resource constrained systems\.European Journal of Operational Research115\(1\),pp\. 158–172\.External Links:[Document](https://dx.doi.org/10.1016/S0377-2217%2898%2900190-8),[Link](https://doi.org/10.1016/S0377-2217(98)00190-8)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[28\]X\. Li and T\. E\. Marlin\(2009\)Robust supply chain performance via model predictive control\.Computers & Chemical Engineering33\(12\),pp\. 2134–2143\.External Links:[Document](https://dx.doi.org/10.1016/j.compchemeng.2009.06.029),[Link](https://doi.org/10.1016/j.compchemeng.2009.06.029)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[29\]D\. A\. Nembhard and F\. Bentefouet\(2012\)Parallel system scheduling with general worker learning and forgetting\.International Journal of Production Economics139\(2\),pp\. 533–542\.External Links:[Document](https://dx.doi.org/10.1016/j.ijpe.2012.05.024),[Link](https://doi.org/10.1016/j.ijpe.2012.05.024)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[30\]M\. Ortega and L\. Lin\(2004\)Control theory applications to the production\-inventory problem: a review\.International Journal of Production Research42\(11\),pp\. 2303–2322\.External Links:[Document](https://dx.doi.org/10.1080/00207540410001666260),[Link](https://doi.org/10.1080/00207540410001666260)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[31\]E\. Perea\-Lopez, B\. E\. Ydstie, and I\. E\. Grossmann\(2003\)A model predictive control strategy for supply chain optimization\.Computers & Chemical Engineering27\(8–9\),pp\. 1201–1218\.External Links:[Document](https://dx.doi.org/10.1016/S0098-1354%2803%2900047-4),[Link](https://doi.org/10.1016/S0098-1354(03)00047-4)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[32\]T\. Prunet, N\. Absi, V\. Borodin, and D\. Cattaruzza\(2024\)Optimization of human\-aware logistics and manufacturing systems: a comprehensive review of modeling approaches and applications\.EURO Journal on Transportation and Logistics13,pp\. 100136\.External Links:[Document](https://dx.doi.org/10.1016/j.ejtl.2024.100136),[Link](https://doi.org/10.1016/j.ejtl.2024.100136)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p1.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[33\]T\. Prunet, N\. Absi, V\. Borodin, and D\. Cattaruzza\(2024\)Optimization of human\-aware logistics and manufacturing systems: a survey on the human\-aware models\.EURO Journal on Transportation and Logistics13,pp\. 100137\.External Links:[Document](https://dx.doi.org/10.1016/j.ejtl.2024.100137),[Link](https://doi.org/10.1016/j.ejtl.2024.100137)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[34\]C\. Ruf, J\. F\. Bard, and R\. Kolisch\(2022\)Workforce capacity planning with hierarchical skills, long\-term training, and random resignations\.International Journal of Production Research60\(2\),pp\. 783–807\.External Links:[Document](https://dx.doi.org/10.1080/00207543.2021.2017058),[Link](https://doi.org/10.1080/00207543.2021.2017058)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[35\]T\. Ruppert, G\. Dorgo, and J\. Abonyi\(2020\)Fuzzy activity time\-based model predictive control of open\-station assembly lines\.Journal of Manufacturing Systems54,pp\. 12–23\.External Links:[Document](https://dx.doi.org/10.1016/j.jmsy.2019.11.005),[Link](https://doi.org/10.1016/j.jmsy.2019.11.005)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p2.1)\.
- \[36\]H\. Sarimveis, P\. Patrinos, C\. D\. Tarantilis, and C\. T\. Kiranoudis\(2008\)Dynamic modeling and control of supply chain systems: a review\.Computers & Operations Research35\(11\),pp\. 3530–3561\.External Links:[Document](https://dx.doi.org/10.1016/j.cor.2007.01.017),[Link](https://doi.org/10.1016/j.cor.2007.01.017)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[37\]G\. Schildbach and M\. Morari\(2016\)Scenario\-based model predictive control for multi\-echelon supply chain management\.European Journal of Operational Research252\(2\),pp\. 540–549\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2016.01.051),[Link](https://doi.org/10.1016/j.ejor.2016.01.051)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p1.1)\.
- \[38\]T\. Sprodowski, J\. K\. Sagawa, A\. S\. Maluf, M\. Freitag, and J\. Pannek\(2020\)A multi\-product job shop scenario utilising model predictive control\.Expert Systems with Applications162,pp\. 113734\.External Links:[Document](https://dx.doi.org/10.1016/j.eswa.2020.113734),[Link](https://doi.org/10.1016/j.eswa.2020.113734)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p2.1)\.
- \[39\]S\. Tavakoli Kafiabad, M\. K\. Zanjani, and M\. Nourelfath\(2022\)Workforce training and operations planning for maintenance centres under demand uncertainty\.International Journal of Production Research60\(5\),pp\. 1587–1599\.External Links:[Document](https://dx.doi.org/10.1080/00207543.2020.1866781),[Link](https://doi.org/10.1080/00207543.2020.1866781)Cited by:[§2\.3](https://arxiv.org/html/2606.17269#S2.SS3.p1.1)\.
- \[40\]S\. Valeva, M\. Hewitt, B\. W\. Thomas, and K\. G\. Brown\(2017\)Balancing flexibility and inventory in workforce planning with learning\.International Journal of Production Economics183,pp\. 194–207\.External Links:[Document](https://dx.doi.org/10.1016/j.ijpe.2016.10.026),[Link](https://doi.org/10.1016/j.ijpe.2016.10.026)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[41\]S\. Valeva, M\. Hewitt, and B\. W\. Thomas\(2017\)A matheuristic for workforce planning with employee learning and stochastic demand\.International Journal of Production Research55\(24\),pp\. 7380–7397\.External Links:[Document](https://dx.doi.org/10.1080/00207543.2017.1349950),[Link](https://doi.org/10.1080/00207543.2017.1349950)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p2.1)\.
- \[42\]J\. Van den Bergh, J\. Belien, P\. De Bruecker, E\. Demeulemeester, and L\. De Boeck\(2013\)Personnel scheduling: a literature review\.European Journal of Operational Research226\(3\),pp\. 367–385\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2012.11.029),[Link](https://doi.org/10.1016/j.ejor.2012.11.029)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p1.1),[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.
- \[43\]F\. D\. Vargas\-Villamil and D\. E\. Rivera\(2000\)Multilayer optimization and scheduling using model predictive control: application to reentrant semiconductor manufacturing lines\.Computers & Chemical Engineering24\(8\),pp\. 2009–2021\.External Links:[Document](https://dx.doi.org/10.1016/S0098-1354%2800%2900598-6),[Link](https://doi.org/10.1016/S0098-1354(00)00598-6)Cited by:[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p2.1)\.
- \[44\]P\. Wenzelburger and F\. Allgower\(2021\)Model predictive control for flexible job shop scheduling in industry 4\.0\.Applied Sciences11\(17\),pp\. 8145\.External Links:[Document](https://dx.doi.org/10.3390/app11178145),[Link](https://doi.org/10.3390/app11178145)Cited by:[§1](https://arxiv.org/html/2606.17269#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.17269#S2.SS1.p2.1)\.
- \[45\]J\. R\. Zamiska, M\. Y\. Jaber, and H\. V\. Kher\(2007\)Worker deployment in dual resource constrained systems with a task\-type factor\.European Journal of Operational Research177\(3\),pp\. 1507–1519\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2005.04.018),[Link](https://doi.org/10.1016/j.ejor.2005.04.018)Cited by:[§2\.2](https://arxiv.org/html/2606.17269#S2.SS2.p1.1)\.Similar Articles
SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions
This paper introduces SkillChain-Gym, a benchmark specification for reskilling-aware production-inventory control that models worker skill dynamics, training actions, and disruption scenarios to evaluate policy regimes.
SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration
SkillFlow proposes a flow-driven recursive skill evolution framework for LLM-based agentic orchestration, using Tempered Trajectory Balance to prevent strategy collapse and provide transparent credit assignment. Experiments on 14 datasets show significant improvements over baselines in QA, math, code, and decision-making tasks.
Forecasting Green Skill Demand in the Automotive Industry: Evidence from Online Job Postings
This paper presents an end-to-end pipeline for identifying and forecasting green skill demand using online job postings from Mexico's automotive industry. It benchmarks 15 time-series forecasting models, finding transformer-based models like FEDformer and Informer perform best, and introduces a two-dimensional framework to classify skills by growth dynamics.
Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game, identifying four inference-time levers and introducing the concept of agent bullwhip. It shows that a reasoning model can exceed human performance, and proposes GRPO-based post-training to improve reliability.
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
The article introduces the SLIM framework, which optimizes dynamic skill lifecycles in agentic reinforcement learning by jointly updating active skill sets with policy learning. Experiments show SLIM outperforms baselines by improving task performance through efficient skill retention and expansion.