Tag
OpenAI researchers derive a bias-free action-dependent baseline for variance reduction in policy gradient methods, demonstrating improved learning efficiency on high-dimensional control tasks, multi-agent, and partially observed environments.