Tag
Discussion of recent agentic RL papers, highlighting action masking as a common technique and its evolution with world modeling papers like ECHO and PaW.
This paper studies adversarial action masking in self-play reinforcement learning, where an attacker selectively removes legal actions from a victim's action set. The attack is shown to be significantly more damaging than random masking or perturbation baselines across multiple environments and algorithms, and victims do not recover under extended training.