Tag
This paper introduces Approximate Next Policy Sampling (ANPS) as an alternative to conservative policy updates in deep reinforcement learning. It proposes Stable Value Approximate Policy Iteration (SV-API) and SV-RL, which align training data with the next policy's state distribution to allow for larger and safer policy updates.