Tag
This paper investigates the temporal correlation problem in on-policy reinforcement learning with PPO, showing that randomly dropping a fixed fraction of transitions from rollouts reduces gradient redundancy and stabilizes training without degrading performance.