approximate-policy-iteration

#approximate-policy-iteration

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper introduces Approximate Next Policy Sampling (ANPS) as an alternative to conservative policy updates in deep reinforcement learning. It proposes Stable Value Approximate Policy Iteration (SV-API) and SV-RL, which align training data with the next policy's state distribution to allow for larger and safer policy updates.

0 favorites 0 likes

approximate-policy-iteration

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL

Submit Feedback