safe-reinforcement-learning

#safe-reinforcement-learning

CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning

arXiv cs.AI ↗ · 21h ago Cached

This paper proposes Constraint-Sensitive Policy Optimization (CSPO), a first-order primal-dual method for safe reinforcement learning that incorporates local constraint sensitivity to improve safety recovery and reduce oscillations near safety boundaries, achieving higher constrained returns on navigation and locomotion benchmarks.

0 favorites 0 likes

#safe-reinforcement-learning

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

arXiv cs.LG ↗ · 21h ago Cached

A method for contract-based compositional shielding that ensures global safety in multi-agent reinforcement learning without centralized runtime control, using local LTL obligations and a multi-armed bandit to optimize team reward.

0 favorites 0 likes

#safe-reinforcement-learning

Robust Shielding for Safe Reinforcement Learning

arXiv cs.AI ↗ · 2026-06-02 Cached

Introduces a novel shielding framework for robust Markov decision processes (RMDPs) that formally guarantees safety under uncertain transition dynamics, proving soundness and optimality. The approach combines with PAC guarantees for learned models, enabling safe reinforcement learning in unknown environments.

0 favorites 0 likes

safe-reinforcement-learning

CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

Robust Shielding for Safe Reinforcement Learning

Submit Feedback