Tag
A method for contract-based compositional shielding that ensures global safety in multi-agent reinforcement learning without centralized runtime control, using local LTL obligations and a multi-armed bandit to optimize team reward.
Introduces a novel shielding framework for robust Markov decision processes (RMDPs) that formally guarantees safety under uncertain transition dynamics, proving soundness and optimality. The approach combines with PAC guarantees for learned models, enabling safe reinforcement learning in unknown environments.